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PREFACE 

The  purpose  of  this  monograph  is  to  give  an  axiomatic 
foundation  for  the  theory  of  probability.  The  author  set  himself 
the  task  of  putting  in  their  natural  place,  among  the  general 
notions  of  modern  mathematics,  the  basic  concepts  of  probability 
theory — concepts  which  until  recently  were  considered  to  be  quite 
peculiar. 

This  task  would  have  been  a  rather  hopeless  one  before  the 
introduction  of  Lebesgue's  theories  of  measure  and  integration. 
However,  after  Lebesgue's  publication  of  his  investigations,  the 
analogies  between  measure  of  a  set  and  probability  of  an  event, 
and  between  integral  of  a  function  and  mathematical  expectation 
of  a  random  variable,  became  apparent.  These  analogies  allowed 
of  further  extensions;  thus,  for  example,  various  properties  of 
independent  random  variables  were  seen  to  be  in  complete  analogy 
with  the  corresponding  properties  of  orthogonal  functions.  But 
if  probability  theory  was  to  be  based  on  the  above  analogies,  it 
still  was  necessary  to  make  the  theories  of  measure  and  integra- 
tion independent  of  the  geometric  elements  which  were  in  the 
foreground  with  Lebesgue.  This  has  been  done  by  Frechet. 

While  a  conception  of  probability  theory  based  on  the  above 
general  viewpoints  has  been  current  for  some  time  among  certain 
mathematicians,  there  was  lacking  a  complete  exposition  of  the 
whole  system,  free  of  extraneous  complications.  (Cf.,  however, 
the  book  by  Frechet,  [2]  in  the  bibliography.) 

I  wish  to  call  attention  to  those  points  of  the  present  exposition 
which  are  outside  the  above-mentioned  range  of  ideas  familiar  to 
the  specialist.  They  are  the  following:  Probability  distributions 
in  infinite-dimensional  spaces  (Chapter  III,  §  4)  ;  differentiation 
and  integration  of  mathematical  expectations  with  respect  to  a 
parameter  (Chapter  IV,  §  5)  ;  and  especially  the  theory  of  condi- 
tional probabilities  and  conditional  expectations  (Chapter  V). 
It  should  be  emphasized  that  these  new  problems  arose,  of  neces- 
sity, from  some  perfectly  concrete  physical  problems.1 


1  Cf.,  e.g.,  the  paper  by  M.  Leontovich  quoted  in  footnote  6  on  p.  46;  also  the 
joint  paper  by  the  author  and  M.  Leontovich,  Zur  Statistik  der  kontinuier- 
lichen  Systeme  und  des  zeitlichen  Verlaufes  der  physikalischen  Vorgdnge. 
Phys.  Jour,  of  the  USSR,  Vol.  3,  1933,  pp.  35-63. 


vi  Preface 

The  sixth  chapter  contains  a  survey,  without  proofs,  of  some 
results  of  A.  Khinchine  and  the  author  of  the  limitations  on  the 
applicability  of  the  ordinary  and  of  the  strong  law  of  large  num- 
bers. The  bibliography  contains  some  recent  works  which  should 
be  of  interest  from  the  point  of  view  of  the  foundations  of  the 
subject. 

I  wish  to  express  my  warm  thanks  to  Mr.  Khinchine,  who 
has  read  carefully  the  whole  manuscript  and  proposed  several 
improvements. 

Kljasma  near  Moscow,  Easter  1933. 

A.  Kolmogorov 
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Chapter  I 


ELEMENTARY  THEORY  OF  PROBABILITY 

We  define  as  elementary  theory  of  probability  that  part  of 
the  theory  in  which  we  have  to  deal  with  probabilities  of  only  a 
finite  number  of  events.  The  theorems  which  we  derive  here  can 
be  applied  also  to  the  problems  connected  with  an  infinite  number 
of  random  events.  However,  when  the  latter  are  studied,  essen- 
tially new  principles  are  used.  Therefore  the  only  axiom  of  the 
mathematical  theory  of  probability  which  deals  particularly  with 
the  case  of  an  infinite  number  of  random  events  is  not  introduced 
until  the  beginning  of  Chapter  II  (Axiom  VI). 

The  theory  of  probability,  as  a  mathematical  discipline,  can 
and  should  be  developed  from  axioms  in  exactly  the  same  way 
as  Geometry  and  Algebra.  This  means  that  after  we  have  defined 
the  elements  to  be  studied  and  their  basic  relations,  and  have 
stated  the  axioms  by  which  these  relations  are  to  be  governed, 
all  further  exposition  must  be  based  exclusively  on  these  axioms, 
independent  of  the  usual  concrete  meaning  of  these  elements  and 
their  relations. 

In  accordance  with  the  above,  in  §  1  the  concept  of  a  field  of 
probabilities  is  defined  as  a  system  of  sets  which  satisfies  certain 
conditions.  What  the  elements  of  this  set  represent  is  of  no  im- 
portance in  the  purely  mathematical  development  of  the  theory 
of  probability  (cf.  the  introduction  of  basic  geometric  concepts 
in  the  Foundations  of  Geometry  by  Hilbert,  or  the  definitions  of 
groups,  rings  and  fields  in  abstract  algebra). 

Every  axiomatic  (abstract)  theory  admits,  as  is  well  known, 
of  an  unlimited  number  of  concrete  interpretations  besides  those 
from  which  it  was  derived.  Thus  we  find  applications  in  fields  of 
science  which  have  no  relation  to  the  concepts  of  random  event 
and  of  probability  in  the  precise  meaning  of  these  words. 

The  postulational  basis  of  the  theory  of  probability  can  be 
established  by  different  methods  in  respect  to  the  selection  of 
axioms  as  well  as  in  the  selection  of  basic  concepts  and  relations. 
However,  if  our  aim  is  to  achieve  the  utmost  simplicity  both  in 
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the  system  of  axioms  and  in  the  further  development  of  the 
theory,  then  the  postulational  concepts  of  a  random  event  and 
its  probability  seem  the  most  suitable.  There  are  other  postula- 
tional systems  of  the  theory  of  probability,  particularly  those  in 
which  the  concept  of  probability  is  not  treated  as  one  of  the  basic 
concepts,  but  is  itself  expressed  by  means  of  other  concepts.1 
However,  in  that  case,  the  aim  is  different,  namely,  to  tie  up  as 
closely  as  possible  the  mathematical  theory  with  the  empirical 
development  of  the  theory  of  probability. 

§  1.   Axioms2 

Let  E  be  a  collection  of  elements  (t  rj,  £, . . . ,  which  we  shall  call 
elementary  events,  and  g  a  set  of  subsets  of  E;  the  elements  of 
the  set  g  will  be  called  random  events. 

I.  5  is  a  field3  of  sets. 
II.  g  contains  the  set  E. 

III.  To  each  set  Ain%  is  assigned  a  non-negative  real  number 
P(A).  This  number  P(A)  is  called  the  probability  of  the  event  A. 

IV.  P(E)  equals  1. 

V.  //  A  and  B  have  no  element  in  common,  then 

P(A  +  B)=P(A)+P(B) 

A  system  of  sets,  $,  together  with  a  definite  assignment  of 
numbers  P(A),  satisfying  Axioms  I-V,  is  called  a  field  of  prob- 
ability. 

Our  system  of  Axioms  I-V  is  consistent.  This  is  proved  by  the 
following  example.  Let  E  consist  of  the  single  element  $  and  let  g 
consist  of  E  and  the  null  set  0.  P(E)  is  then  set  equal  to  1  and 
P(0)  equals  0. 


1  For  example,  R.  von  Mises[l]and  [2]  and  S.  Bernstein  [1]. 

2  The  reader  who  wishes  from  the  outset  to  give  a  concrete  meaning  to  the 
following  axioms,  is  referred  to  §  2. 

3  Cf .  Hausdorff,  Mengenlehre,  1927,  p.  78.  A  system  of  sets  is  called  a  field 
if  the  sum,  product,  and  difference  of  two  sets  of  the  system  also  belong  to  the 
same  system.  Every  non-empty  field  contains  the  null  set  0.  Using  Hausdorff 's 
notation,  we  designate  the  product  of  A  and  B  by  AB;  the  sum  by  A  +  B  in 
the  case  where  AB  —  0;  and  in  the  general  case  by  A  +  B;  the  difference  of 
A  and  B  by  A-B.  The  set  E- A,  which  is  the  complement  of  A,  will  be  denoted 
by  K.  We  shall  assume  that  the  reader  is  familiar  with  the  fundamental  rules 
of  operations  of  sets  and  their  sums,  products,  and  differences.  All  subsets 
of  g  will  be  designated  by  Latin  capitals. 


§  2.  The  Relation  to  Experimental  Data  3 

Our  system  of  axioms  is  not,  however,  complete,  for  in  various 
problems  in  the  theory  of  probability  different  fields  of  proba- 
bility have  to  be  examined. 

The  Construction  of  Fields  of  Probability.  The  simplest  fields 
of  probability  are  constructed  as  follows.  We  take  an  arbitrary 
finite  set  E  =  {|t,  £2, . . .,  £*}  and  an  arbitrary  set  {plt  p2, . . .,  pk) 
of  non-negative  numbers  with  the  sum  Pi  +  p2  +  •  •  •  +  Pk  —  1. 
g  is  taken  as  the  set  of  all  subsets  in  E,  and  we  put 

P{fti,^,...,^}  =  ^i  +  fc  +  -v  +  ^. 

In  such  cases,  pu  p2,  .  .  .  ,  pk  are  called  the  probabilities  of  the 
elementary  events  $1}  £2,  .  .  .  ,  $k  or  simply  elementary  probabili- 
ties. In  this  way  are  derived  all  possible  finite  fields  of  probability 
in  which  gf  consists  of  the  set  of  all  subsets  of  E.  (The  field  of 
probability  is  called  finite  if  the  set  E  is  finite.)  For  further 
examples  see  Chap.  II,  §  3. 

§  2.   The  Relation  to  Experimental  Data4 

We  apply  the  theory  of  probability  to  the  actual  world  of 
experiments  in  the  following  manner: 

1)  There  is  assumed  a  complex  of  conditions,  ©,  which  allows 
of  any  number  of  repetitions. 

2)  We  study  a  definite  set  of  events  which  could  take  place  as 
a  result  of  the  establishment  of  the  conditions  S.  In  individual 
cases  where  the  conditions  are  realized,  the  events  occur,  gener- 
ally, in  different  ways.  Let  E  be  the  set  of  all  possible  variants 
d,  &,  .  .  .  of  the  outcome  of  the  given  events.  Some  of  these  vari- 
ants might  in  general  not  occur.  We  include  in  set  E  all  the  vari- 
ants which  we  regard  a  priori  as  possible. 

3)  If  the  variant  of  the  events  which  has  actually  occurred 


4  The  reader  who  is  interested  in  the  purely  mathematical  development  of 
the  theory  only,  need  not  read  this  section,  since  the  work  following  it  is  based 
only  upon  the  axioms  in  §  1  and  makes  no  use  of  the  present  discussion.  Here 
we  limit  ourselves  to  a  simple  explanation  of  how  the  axioms  of  the  theory  of 
probability  arose  and  disregard  the  deep  philosophical  dissertations  on  the 
concept  of  probability  in  the  experimental  world.  In  establishing  the  premises 
necessary  for  the  applicability  of  the  theory  of  probability  to  the  world  of 
actual  events,  the  author  has  used,  in  large  measure,  the  work  of  R.  v.  Mises, 
[1]  pp.  21-27. 
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upon  realization  of  conditions  8  belongs  to  the  set  A  (defined  in 
any  way) ,  then  we  say  that  the  event  A  has  taken  place. 

Example :  Let  the  complex  3  of  conditions  be  the  tossing  of  a 
coin  two  times.  The  set  of  events  mentioned  in  Paragraph  ^con- 
sists of  the  fact  that  at  each  toss  either  a  head  or  tail  may  come  up. 
From  this  it  follows  that  only  four  different  variants  (elementary 
events)  are  possible,  namely:  HH,  HT,  TH,  TT.  If  the  "event  A" 
connotes  the  occurrence  of  a  repetition,  then  it  will  consist  of  a 
happening  of  either  of  the  first  or  fourth  of  the  four  elementary 
events.  In  this  manner,  every  event  may  be  regarded  as  a  set  of 
elementary  events. 

4)  Under  certain  conditions,  which  we  shall  not  discuss  here, 
we  may  assume  that  to  an  event  A  which  may  or  may  not  occur 
under  conditions  8,  is  assigned  a  real  number  P(A)  which  has 
the  following  characteristics : 

(a)  One  can  be  practically  certain  that  if  the  complex  of  con- 
ditions 6  is  repeated  a  large  number  of  times,  n,  then  if  m  be  the 
number  of  occurrences  of  event  A,  the  ratio  m/n  will  differ  very 
slightly  from  P  ( A ) . 

(b)  If  P(A)  is  very  small,  one  can  be  practically  certain  that 
when  conditions  @  are  realized  only  once,  the  event  A  would  not 
occur  at  all. 

The  Empirical  Deduction  of  the  Axioms.  In  general,  one  may 
assume  that  the  system  g  of  the  observed  events  A,  B,  C,  ...  to 
which  are  assigned  definite  probabilities,  form  a  field  containing 
as  an  element  the  set  E  (Axioms  I,  II,  and  the  first  part  of 
III,  postulating  the  existence  of  probabilities).  It  is  clear  that 
O^m/n^l  so  that  the  second  part  of  Axiom  III  is  quite  natural. 
For  the  event  E,  m  is  always  equal  to  n,  so  that  it  is  natural  to 
postulate  ?(E)  =1  (Axiom  IV).  If,  finally,  A  and  B  are  non- 
intersecting  (incompatible),  then  m  —  m1  +  m2  where  m,  mlt  m2 
are  respectively  the  number  of  experiments  in  which  the  events 
A  +  B,  A,  and  B  occur.  From  this  it  follows  that 

m  m1        m2 

n  n  n 

It  therefore  seems  appropriate  to  postulate  that  P(A  +  B)  — 
P(A)  +  P(J5)    (Axiom  V). 
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Remark  1.  If  two  separate  statements  are  each  practically 
reliable,  then  we  may  say  that  simultaneously  they  are  both  reli- 
able, although  the  degree  of  reliability  is  somewhat  lowered  in  the 
process.  If,  however,  the  number  of  such  statements  is  very  large, 
then  from  the  practical  reliability  of  each,  one  cannot  deduce  any- 
thing about  the  simultaneous  correctness  of  all  of  them.  Therefore 
from  the  principle  stated  in  (a)  it  does  not  follow  that  in  a  very 
large  number  of  series  of  n  tests  each,  in  each  the  ratio  m/n  will 
differ  only  slightly  from  P(A). 

Remark  2.  To  an  impossible  event  (an  empty  set)  corre- 
sponds, in  accordance  with  our  axioms,  the  probability  P(0)  =  05, 
but  the  converse  is  not  true:  P(A)  =0  does  not  imply  the  im- 
possibility of  A.  When  P(A)  —  0,  from  principle  (b)  all  we  can 
assert  is  that  when  the  conditions  ©  are  realized  but  once,  event 
A  is  practically  impossible.  It  does  not  at  all  assert,  however,  that 
in  a  sufficiently  long  series  of  tests  the  event  A  will  not  occur.  On 
the  other  hand,  one  can  deduce  from  the  principle  (a)  merely  that 
when  P(A)  =  0  and  n  is  very  large,  the  ratio  m/n  will  be  very 
small  (it  might,  for  example,  be  equal  to  1/n). 

§  3.   Notes  on  Terminology 

We  have  defined  the  objects  of  our  future  study,  random 
events,  as  sets.  However,  in  the  theory  of  probability  many  set- 
theoretic  concepts  are  designated  by  other  terms.  We  shall  give 
here  a  brief  list  of  such  concepts. 

Theory  of  Sets  Random  Events 

1.  A  and  B  do  not  intersect,  1.  Events  A  and  B  are  in- 
i.e.,  AB  —  0.                                       compatible. 

2.  AB.  .  .2V~  =  0.  2.  Events  A,  B,  ...  ,2V  are 

incompatible. 

3.  AB  . . .  N  =  X.  3.  Event  X  is  defined  as  the 

simultaneous    occurrence    of 
events  A,  B, . . .  ,N. 

4.  A  4-  B  +  .  . .  +  N  =  X.  4.  Event  X  is  defined  as  the 

occurrence  of  at  least  one  of 
the  events  A,B,...,N. 


8Cf.  §4,  Formula  (3). 
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Theory  of  Sets 
5.  The   complementary   set 


6.  A  =  0. 

7.  A  =  E. 

8.  The  system  51  of  the  sets 
Alt  A2,  .  .  .  ,  An  forms  a  de- 
composition of  the  set    E  if 
A1  +  A2  +  . . .  +  An  =  E. 

(This  assumes  that  the 
sets  At  do  not  intersect,in 
pairs.) 

9.  B  is  a  subset  of  A :  2?tc  A. 


Random  Events 

5.  The  opposite  event  A 
consisting  of  the  non-occur- 
ence  of  event  A. 

6.  Event  A  is  impossible. 

7.  Event  A  must  occur. 

8.  Experiment  %  consists  of 
determining  which  of  the 
events  Au  A2,  .  .  .  ,  An  occurs. 
We  therefore  call  Al9  A2, .  .  . , 
An  the  possible  results  of  ex- 
periment 51. 

9.  From  the  occurrence  of 
event  B  follows  the  inevitable 
occurrence  of  A. 


§  4.   Immediate  Corollaries  of  the  Axioms ;  Conditional 
Probabilities ;  Theorem  of  Bayes 

From  A  +  A  =  E  and  the  Axioms  IV  and  V  it  follows  that 

P(A)  +P(A)  =1  (1) 

P(A)  =1—  P(A)  .  (2) 

Since  E  =  0,  then,  in  particular, 

P(0)=0   .  (3) 

If  A,  B,  . . . ,  N  are  incompatible,  then  from  Axiom  V  follows 
the  formula  (the  Addition  Theorem) 

P(A  +£+...  +N)=  P(A)  +  P(£)  +  ...+  P(N) 

If  P(A)  >0,  then  the  quotient 

P(AB) 


(4) 


?a(B)  = 


P(A) 


(5) 


is  defined  to  be  the  conditional  probability  of  the  event  B  under 
the  condition  A. 

From  (5)  it  follows  immediately  that 


§  4.     Immediate  Corollaries  of  the  Axioms  7 

P(AB)=P(A)PA(B)  .  (6) 

And  by  induction  we  obtain  the  general  formula  (the  Multi- 
plication Theorem) 

P(A1A2...An)  =  P(Al)PAl(A2)PAlAAA3)...PAlA2...An-l(An).    (7) 

The  following  theorems  follow  easily : 

P4(5)g0,  (8) 

PA(E)  =  1,  (9) 

PAB  +  C)=?AB)+?AC).  (10) 

Comparing  formulae  (8)  —  (10)  with  axioms  III — V,  we  find  that 
the  system  $  of  sets  together  with  the  set  function  PA(B)  (pro- 
vided A  is  a  fixed  set),  form  a  field  of  probability  and  therefore, 
all  the  above  general  theorems  concerning  P(B)  hold  true  for  the 
conditional  probability  PA(B)  (provided  the  event  A  is  fixed). 
It  is  also  easy  to  see  that 

P^(A)=1.  (11) 

From  (6)  and  the  analogous  formula 

P  (AB)=P(B)PB(A) 
we  obtain  the  important  formula : 

PB{A)  =  ^m,  (12) 

which  contains,  in  essence,  the  Theorem  of  Bayes. 

The  Theorem  on  Total  Probability:  Let  A1  +  A2  +  . . .  + 
An  —  E  (this  assumes  that  the  events  Alf  A2J . . . ,  An  are  mutually 
exclusive)  and  let  X  be  arbitrary.  Then 

P(X)  =  PiAJ  PAl(X)  +  P(A2)  PAt(X)  +  ...  +  P(An)  PAn(X).-  (13) 
Proof : 

X  =  AiX  +  A2X  +  . . .  +  A„X; 
using  (4)  we  have 

P(X)=  P(A1X)+P(A2X)  +  ...+  P(A„X) 
and  according  to  (6)  we  have  at  the  same  time 
P(AiX)=P(Ai)PAt(X). 

The  Theorem  of  Bayes:  Let  A1  +  A2  +  . . .  +  An  =  E  and 
X  be  arbitrary,  then 

p   (A,       PWP^X) 

x(   *       PiAJP^W  +  P(A2)PA,(X)  +  ■■■  +  P(An)PA„(X)'      (14> 

i  =  1,  2,  3,... .,  ». 
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Alt  A2,  .  .  .  ,  An  are  often  called  "hypotheses"  and  formula 
(14)  is  considered  as  the  probability  P*(A{)  of  the  hypothesis 
Ai  after  the  occurrence  of  event  X.  [P(A*)  then  denotes  the 
a  priori  probability  of  A*.] 

Proof:  From  (12)  we  have 

PWP^(X) 


Px(Ai) 


P(X) 


To  obtain  the  formula  (14)  it  only  remains  to  substitute  for  the 
probability  P(X)  its  value  derived  from  (13)  by  applying  the 
theorem  on  total  probability. 

§  5.   Independence 

The  concept  of  mutual  independence  of  two  or  more  experi- 
ments holds,  in  a  certain  sense,  a  central  position  in  the  theory  of 
probability.  Indeed,  as  we  have  already  seen,  the  theory  of 
probability  can  be  regarded  from  the  mathematical  point  of  view 
as  a  special  application  of  the  general  theory  of  additive  set  func- 
tions. One  naturally  asks,  how  did  it  happen  that  the  theory  of 
probability  developed  into  a  large  individual  science  possessing 
its  own  methods? 

In  order  to  answer  this  question,  we  must  point  out  the  spe- 
cialization undergone  by  general  problems  in  the  theory  of  addi- 
tive set  functions  when  they  are  proposed  in  the  theory  of 
probability. 

The  fact  that  our  additive  set  function  P(A)  is  non-negative 
and  satisfies  the  condition  P(E)  =  1,  does  not  in  itself  cause  new 
difficulties.  Random  variables  (see  Chap.  Ill)  from  a  mathe- 
matical point  of  view  represent  merely  functions  measurable  with 
respect  to  P(A),  while  their  mathematical  expectations  are 
abstract  Lebesgue  integrals.  (This  analogy  was  explained  fully 
for  the  first  time  in  the  work  of  Frechet6.)  The  mere  introduction 
of  the  above  concepts,  therefore,  would  not  be  sufficient  to  pro- 
duce a  basis  for  the  development  of  a  large  new  theory. 

Historically,  the  independence  of  experiments  and  random 
variables  represents  the  very  mathematical  concept  that  has  given 
the  theory  of  probability  its  peculiar  stamp.  The  classical  work 
or  LaPlace,  Poisson,  Tchebychev,  Markov,  Liapounov,  Mises,  and 


•See  Frechet  [1]  and  [2]. 
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Bernstein  is  actually  dedicated  to  the  fundamental  investigation 
of  series  of  independent  random  variables.  Though  the  latest 
dissertations  (Markov,  Bernstein  and  others)  frequently  fail  to 
assume  complete  independence,  they  nevertheless  reveal  the 
necessity  of  introducing  analogous,  weaker,  conditions,  in  order 
to  obtain  sufficiently  significant  results  (see  in  this  chapter  §  6, 
Markov  chains) . 

We  thus  see,  in  the  concept  of  independence,  at  least  the  germ 
of  the  peculiar  type  of  problem  in  probability  theory.  In  this 
book,  however,  we  shall  not  stress  that  fact,  for  here  we  are 
interested  mainly  in  the  logical  foundation  for  the  specialized 
investigations  of  the  theory  of  probability. 

In  consequence,  one  of  the  most  important  problems  in  the 
philosophy  of  the  natural  sciences  is — in  addition  to  the  well- 
known  one  regarding  the  essence  of  the  concept  of  probability 
itself — to  make  precise  the  premises  which  would  make  it  possible 
to  regard  any  given  real  events  as  independent.  This  question, 
however,  is  beyond  the  scope  of  this  book. 


Let  us  turn  to  the  definition  of  independence.  Given  n  experi- 
ments 5l(1),  5l(2),  .  .  .  ,  5lU),  that  is,  n  decompositions 

E  =  Af  +  A$]  + h  A1*}  i=\,2,...,n 

of  the  basic  set  E.  It  is  then  possible  to  assign  r  =  r1r2 .  .  .  rn  proba- 
bilities (in  the  general  case) 

P^...qn  =  P(A(q\)A%;..A{qnJ)^0 
which  are  entirely  arbitrary  except  for  the  single  condition7  that 

2      Ah<?8...«»  =  1      •  (!) 

Definition  I.  n  experiments  3i(1),  5l(2),  .  .  .  ,  3l(n>  are  called 
mutually  independent,  if  for  any  ql9  q2,  .  .  .  ,  qn  the  following 
equation  holds  true : 

p (4>4?  •  •  •  O  =  p «>)  p  (4?)  •  •  ■  p(4:')  •      (2) 


7  One  may  construct  a  field  of  probability  with  arbitrary  probabilities  sub- 
ject only  to  the  above-mentioned  conditions,  as  follows:  E  is  composed  of  r 
elements  £«,  qt . . .  qn  .  Let  the  corresponding  elementary  probabilities  be 
PqiQt...in>     and  finally  let    Aqi]       be  the  set  of  all     £f/l9,   tm.9m     for  which 

<7t  =  q- 
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Among  the  r  equations  in  (2),  there  are  only  r-r1-r2- .  .  . -rn  + 
n  -  1  independent  equations8. 

Theorem  I.  If  n  experiments  9l(1\  5l(2),  .  .  .  ,  2i(M>  are  mutu- 
ally independent,  then  any  m  of  them  (ra<  n) ,  9l(t,),  $(**\ ....  5(('m)> 
are  also  independent9. 

In  the  case  of  independence  we  then  have  the  equations : 

p  «4«  •  •  •  4iB>)  =  p  (O  p  C^SW  •  •  •  p  (41-*)      (g) 

(all  4  must  be  different.) 

Definition  II.  n  events  Au  A2, . . . ,  An  are  mutually  indepen- 
dent, if  the  decompositions  (trials) 

E  =  Ak  +  Ak  (k  =  l,2,...,n) 

are  independent. 

In  this  case  rx  =  r2  =  . . .  =  rn  =  2,  r  =  2n ;  therefore,  of  the  2W 
equations  in  (2)  only  2n-n-l  are  independent.  The  necessary 
and  sufficient  conditions  for  the  independence  of  the  events  Alt  A2, 
. . . ,  An  are  the  following  2n  -  n  -  1  equations10 : 

P(A{lAi2...Aim)  =  P(Ail)P(Ai2)...P(A,im),  (4) 

m  —  1,  2,  . . .,  n, 
i^i1<i2<--<im<n. 

All  of  these  equations  are  mutually  independent. 

In  the  case  n  =  2  we  obtain  from  (4)  only  one  condition  (22  -2  - 


8  Actually,  in  the  case  of  independence,  one  may  choose  arbitrarily  only 
fi  +  r*2  +  . . .  +  tn  probabilities  pU)  =  P  {A U))  so  as  to  comply  with  the  n 
conditions  7  " 

<i 
Therefore,  in  the  general  case,  we  have  r-1  degrees  of  freedom,  but  in  the 
case  of  independence  only  ri  +  r2+  ...  +  rn-n. 

9  To  prove  this  it  is  sufficient  to  show  that  from  the  mutual  independence 
of  n  decompositions  follows  the  mutual  independence  of  the  first  n-1.  Let  us 
assume  that  the  equations  (2)  hold.  Then 

p  («. . .  <-»,)  =Jp  («•  •  •  <) 

Qn 

9n  Q.E.D. 

10  See  S.  N.  Bernstein  [1]  pp.  47-57.  However,  the  reader  can  easily  prove 
this  himself  (using  mathematical  induction). 
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1  =  1)  for  the  independence  of  two  events  Ax  and  A2 : 

?UiA2)  =P(A1)P(A2).  (5) 

The  system  of  equations  (2)  reduces  itself }  in  this  case,  to  three 
equations,  besides  (5)  : 

PiAiAz)  =  P(A1)P(A2) 
?{AXA2)  =P(A1)P(Aa) 
?{AXA2)   =P(A1)P(A2)   , 

which  obviously  follow  from  (5).11 

It  need  hardly  be  remarked  that  from  the  independence  of 
the  events  Alt  A2, . . . ,  An  in  pairs,  i.e.  from  the  relations 

P(A«A,)  =P(Ai)P(Ai)  «*> 

it  does  not  at  all  follow  that  when  n>2  these  events  are  inde- 
pendent12. (For  that  we  need  the  existence  of  all  equations  (4).) 

In  introducing  the  concept  of  independence,  no  use  was  made 
of  conditional  probability.  Our  aim  has  been  to  explain  as  clearly 
as  possible,in  a  purely  mathematical  manner,  the  meaning  of  this 
concept.  Its  applications,  however,  generally  depend  upon  the 
properties  of  certain  conditional  probabilities. 

If  we  assume  that  all  probabilities  P(Ag(t>)  are  positive,  then 
from  the  equations  (3)  it  follows13  that 

P«> ...  4;;«>  MM  =  P(4?)  .  (6) 

From  the  fact  that  formulas  (6)  hold,  and  from  the  Multiplica- 
tion Theorem  (Formula  (7),  §4),  follow  the  formulas  (2).  We 
obtain,  therefore, 

Theorem  II:  A  necessary  and  sufficient  condition  for  inde- 
pendence of  experiments  5l(1),  5l(2), .  .  .  ,  9l(w)  in  the  case  of  posi- 

11  P{4iZj  -  P(AX)  -  P{AtA2)  a*  P{AX)  -  P(A^9{A%)  =  P(^){t  -  P(^2)} 
»P(41)P(i"a)  ,etc. 

12  This  can  be  shown  by  the  following  simple  example  (S.  N.  Bernstein) : 
Let  set  E  be  composed  of  four  elements  J 1 ,  £2 ,  £3 ,  <£, ;  the  corresponding  elemen- 
tary probabilities  pit  p2,  p3,  p4  are  each  assumed  to  be  XA  and 

A  ={^,^}r  JB-Wj.W.  C'^ft.W, 
It  is  easy  to  compute  that 

P(A)  =  P(B)=P(C)  ="■%, 
P(AB)=P(BC)  -P(AC)  =  %  =  (V2)2, 
P(A£C)  =.14  *  (V2)3. 
u  To  prove  it,  one  must  keep  in  mind  the  definition  of  conditional  proba- 
bility (Formula  (5),  §  4)  and  substitute  for  the  probabilities  of  products  the 
products  of  probabilities  according  to  formula  (3). 
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five  probabilities  P(A^})  is  that  the  conditional  probability  of 
the  results  Aqw  of  experiments  3t(i'>  under  the  hypothesis  that 
several  other  tests  2l(il),  9l(i,),  ...,Wik)  have  hod  definite  results 
A&\AM,Ai**>,...,A{£)        is   equal   to    the   absolute   probability 

On  the  basis  of  formulas  (4)  we  can  prove  in  an  analogous 
manner  the  following  theorem : 

Theorem  III.  //  all  probabilities  P(Ak)  are  positive,  then  a 
necessary  and  sufficient  condition  for  mutual  independence  of 
the  events  Alt  A2i  .  .  .  ,  An  is  the  satisfaction  of  the  equations 

P,iA...^(A)  =  PW  (7) 

for  any  pairwise  different  ilt  i2,  .  .  .  ,  ik,  i- 

In  the  case  n  —  2  the  conditions  (7)  reduce  to  two  equations: 
PAl(A2)  =  P(A2)f    | 
PAAAl)  =  P(A1).    J 
It  is  easy  to  see  that  the  first  equation  in  (8)  alone  is  a  necessary 
and  sufficient  condition  for  the  independence  of  Ax  and  A2  pro- 
vided P(A1)  >  0. 

§  6.   Conditional  Probabilities  as  Random  Variables, 
Markov  Chains 

Let  51  be  a  decomposition  of  the  fundamental  set  E : 

E  =  A*  +  A2  +  . . .  +Ar, 

and  x  a  real  function  of  the  elementary  event  £T  which  for  every 
set  Aq  is  equal  to  a  corresponding  constant  aq.  x  is  then  called  a 
random  variable,  and  the  sum 

E(x)  -2aQP(A5) 

Q 

is  called  the  mathematical  expectation  of  the  variable  x.  The 
theory  of  random  variables  will  be  developed  in  Chaps.  Ill  and  IV. 
We  shall  not  limit  ourselves  there  merely  to  those  random  vari- 
ables which  can  assume  only  a  finite  number  of  different  values. 
A  random  variable  which  for  every  set  Aq  assumes  the  value 
PAqi(B),  we  shall  call  the  conditional  probability  of  the  event  B 
after  the  given  experiment  %  and  shall  designate  it  by  P^  (B) .  Two 
experiments  5l(1)  and  3l(2)  are  independent  if,  and  only  if, 
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Pm(A?)  =  P(Af)  q=\,2,...,r2. 

Given  any  decompositions  (experiments)  5l(1),  5l(2),  .  .  .  ,  9l(n),  we 
we  shall  represent  by 

2l(1>2l(2)  . . .  $(»> 

the  decomposition  of  set  E  into  the  products 

Experiments  3i(1\  2l(2),  .  .  .  ,  %(n)  are  mutually  independent  when 
and  only  when 

pgB1,a,»...p.1,(4»)  =  P(4'), 

k  and  q  being  arbitrary14. 

Definition:  The  sequence       3l(1),  $(2),  .  .  .  ,  5l(n),  .  .  .  forms 
a  Markov  chain   if  for  arbitrary  n  and  q 

P«»>«<«  ...  w-«>  W)  =  Pa(n-D(4n)). 

Thus,  Markov  chains  form  a  natural  generalization  of  se- 
quences of  mutually  independent  experiments.  If  we  set 

pQm  gn  (m,n)  =  PA™  (A™)  m<n     , 

then  the  basic  formula  of  the  theory  of  Markov  chains  will  assume 
the  form: 

pQkqn(k>  n)  ==  *Zpqkqm(k,  m)  pgmqH(m,  n)y         k<m<n.  (1) 

Qm 

If  we  denote  the  matrix  \\pqmgn(nt,  n)\\    by  p(m,  ri),  (1)  can  be 
written  as15 : 

p(k,n)  —  p(k,m)p(m,n)  k  <  m  <  n.  (2) 


14  The  necessity  of  these  conditions  follows  from  Theorem  II,  §  5 ;  that  they 
are  also  sufficient  follows  immediately  from  the  Multiplication  Theorem 
(Formula  (7)  of  §4). 

16  For  further  development  of  the  theory  of  Markov  chains,  see  R.  v.  Mises 
[1],  §  16,  and  B.  Hostinsky,  Methodes  generates  du  calcul  des  probabilites, 
"Mem.  Sci.  Math."  V.  52,  Paris  1931. 


Chapter  II 

INFINITE  PROBABILITY  FIELDS 

§  1.   Axiom  of  Continuity 

We  denote  by     2)  Am,  as  is  customary,  the  product  of  the  sets 

m 

Am  (whether  finite  or  infinite  in  number)  and  their  sum  by  <5Am . 

m 

Only  in  the  case  of  disjoint  sets  Am  is  the  form  ^Am  used  instead 

m 

of  <&Am.   Consequently, 

m 

®Am  =  A1  +  At+  ■•■; 

ZAm  =  A1  +  A2+---, 

m 

^Am  =  A1A2"-. 

In  all  future  investigations,  we  shall  assume  that  besides  Axioms 
I  -  V,  still  another  holds  true : 

VI.  For  a  decreasing  sequence  of  events 

A1z)A2^-"  3^nz>.-.  (1) 

of  &  for  which 

®A»  =  0    ,  (2) 

the  following  equation    holds: 

lim  P  (4n)  =  0 .  w-*oo  (3) 

In  the  future  we  shall  designate  by  probability  field  only  a 
field  of  probability  as  outlined  in  the  first  chapter,  which  also 
satisfies  Axiom  VI.  The  fields  of  probability  as  defined  in  the  first 
chapter  without  Axiom  VI  might  be  called  generalized  fields  of 
probability. 

If  the  system  J  of  sets  is  finite,  Axiom  VI  follows  from  Axioms 
I  -  V.  For  actually,  in  that  case  there  exist  only  a  finite  number 
of  different  sets  in  the  sequence  (1).  Let  Ak  be  the  smallest 
among  them,  then  all  sets  A^  coincide  with  Ak  and  we  obtain  then 

14 
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n 

limP(^B)  =  P(o)  =  0. 

All  examples  of  finite  fields  of  probability,  in  the  first  chapter, 
satisfy,  therefore,  Axiom  VI.  The  system  of  Axioms  I  -  VI  then 
proves  to  be  consistent  and  incomplete. 

For  infinite  fields,  on  the  other  hand,  the  Axiom  of  Continuity, 
VI,  proved  to  be  independent  of  Axioms  I  -  V.  Since  the  new  axiom 
is  essential  for  infinite  fields  of  probability  only,  it  is  almost  im- 
possible to  elucidate  its  empirical  meaning,  as  has  been  done,  for 
example,  in  the  case  of  Axioms  I  -  V  in  §  2  of  the  first  chapter. 
For,  in  describing  any  observable  random  process  we  can  obtain 
only  finite  fields  of  probability.  Infinite  fields  of  probability  occur 
only  as  idealized  models  of  real  random  processes.  We  limit  our- 
selves,  arbitrarily,  to  only  those  models  which  satisfy  Axiom  VI. 
This  limitation  has  been  found  expedient  in  researches  of  the 
most  diverse  sort. 

Generalized  Addition  Theorem  :  //  Alt  A,,  .  .  .  ,  An,  .  .  .  and 
A  belong  to  ft,  then  from 

A=ZAn  (4) 

follows  the  equation 
Proof:  Let 

Then,  obviously  ^(Rn)  =  0, 

n 

and,  therefore,  according  to  Axiom  VI 

lim  P(Rn)  =  0  fi-»oo  •     (6) 

On  the  other  hand,  by  the  addition  theorem 

P(A)  =  P(A1)  +  P(A2)  +  . . .  +  P(An)  +  P(Rn)  .        (7) 
From  (6)  and  (7)  we  immediately  obtain  (5). 

We  have  shown,  then,  that  the  probability  P(A)  is  a  com- 
pletely additive  set  function  on  $.  Conversely,  Axioms  V  and  VI 
hold  true  for  every  completely  additive  set  function  defined  on 


n 


P{A)=2P(An).  (5) 


n 
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any  field  g.*  We  can,  therefore,  define  the  concept  of  a  field  of 
probability  in  the  following  way :  Let  E  be  an  arbitrary  set,  %  a 
field  of  subsets  of  E,  containing  E,  and  ?(A)  a  non-negative  com- 
pletely additive  set  function  defined  on  gf;  the  field  5  together 
with  the  set  function  ?(A)  forms  a  field  of  probability. 

A  Covering  Theorem  :  //  A,  Alt  A2,  .  .  . ,  An, .  .  .  belong  to  g 
and 

Aa(BAn  i  (8) 

n 

then 

Proof: 

A  =  A  <S(AH)  =AAt  +  A (A2  -  A2AX)  +  A (A3  -  A3A2  -  A3AJ  +  ■  •  • , 

n 

?{A)  =  ?(AAX)  +  P{A(A2  -  A2A,)}  +  ...  ^  P(^)  +  P(^)  +  •••• 

§  2.   Borel  Fields  of  Probability 

The  field  5  is  called  a  Borel  field,  if  all  countable  sums2^» 
of  the  sets  An  from  gf  belong  to  g.  Borel  fields  are  also  called  com- 
pletely additive  systems  of  sets.  From  the  formula 

<SAn  =  A1+  (A2  -  A2AX)  +  (A3  -  A3A2  -  AZAX)  +  •  ■  •  (1) 

n 

we  can  deduce  that  a  Borel  field  contains  also  all  the  sums     <5  An 

n 

composed  of  a  countable  number  of  sets  A»  belonging  to  it.  From 
the  formula 

%An  =  E-(BAn  (2) 

n  n 

the  same  can  be  said  for  the  product  of  sets. 

A  field  of  probability  is  a  Borel  field  of  probability  if  the 
corresponding  field  %  is  a  Borel  field.  Only  in  the  case  of  Borel 
fields  of  probability  do  we  obtain  full  freedom  of  action,  without 
danger  of  the  occurrence  of  events  having  no  probability.  We 
shall  now  prove  that  we  may  limit  ourselves  to  the  investigation 
of  Borel  fields  of  probability.  This  will  follow  from  the  so-called 
extension  theorem,  to  which  we  shall  now  turn. 

Given  a  field  of  probability  (5,  P).  As  is  known1,  there  exists 
a  smallest  Borel  field  B^  containing  5-  And  we  have  the 


*  See,  for  example,  O.  Nikodym,  Sur  une  generalisation  des  integrates  de 
M.  J.  Radon,  Fund.  Math.  v.  15,  1930,  p.  136. 
1  Hausdorff,  Mengenlehre,  1927,  p.  85. 
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Extension  Theorem  :  It  is  always  possible  to  extend  a  non- 
negative  completely  additive  set  function  P(A),  defined  in  %, 
to  all  sets  of  B%  without  losing  either  of  its  properties  (non- 
negativeness  and  complete  additivity)  and  this  can  be  done  in 
only  one  way. 

The  extended  field  B%  forms  with  the  extended  set  func- 
tion P(A)  a  field  of  probability  (B%,  P).  This  field  of  probability 
(B%,  P)  we  shall  call  the  Borel  extension  of  the  field  ($,  P). 

The  proof  of  this  theorem,  which  belongs  to  the  theory  of 
additive  set  functions  and  which  sometimes  appears  in  other 
forms,  can  be  given  as  follows: 

Let  A  be  any  subset  of  E ;  we  shall  denote  by  P*  (A)  the  lower 
limit  of  the  sums 

y:p(An) 

n 

for  all  coverings 

Acz(SAn 

n 

of  the  set  A  by  a  finite  or  countable  number  of  sets  A„  of  $•  It  is 
easy  to  prove  that  P*(A)  is  then  an  outer  measure  in  the 
Caratheodory  sense2.  In  accordance  with  the  Covering  Theorem 
(51),  P*(A)  coincides  with  P(A)  for  all  sets  of  8f.  It  can  be  fur- 
ther shown  that  all  sets  of  $  are  measurable  in  the  Caratheodory 
sense.  Since  all  measurable  sets  form  a  Borel  field,  all  sets  of  B% 
are  consequently  measurable.  The  set  function  P*(A)  is,  there- 
fore, completely  additive  on  B%,  and  on  B%  we  may  set 

P(A)  =  P*(A). 

We  have  thus  shown  the  existence  of  the  extension.  The  unique- 
ness of  this  extension  follows  immediately  from  the  minimal 
property  of  the  field  B%. 

Remark:  Even  if  the  sets  (events)  A  of  5  can  be  interpreted 
as  actual  and  (perhaps  only  approximately)  observable  events, 
it  does  not,  of  course,  follow  from  this  that  the  sets  of  the  extended 
field  B%  reasonably  admit  of  such  an  interpretation. 

Thus  there  is  the  possibility  that  while  a  field  of  probability 
(5,  P)  may  be  regarded  as  the  image  (idealized,  however)  of 


2  Caratheodory,  Vorlesungen  iiber  reelle  Funktionen,  pp. 237-258.    (New- 
York,  Chelsea  Publishing  Company) . 
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actual  random  events,  the  extended  field  of  probability  (B%,  P) 
will  still  remain  merely  a  mathematical  structure. 

Thus  sets  of  B%  are  generally  merely  ideal  events  to  which 
nothing  corresponds  in  the  outside  world.  However,  if  reasoning 
which  utilizes  the  probabilities  of  such  ideal  events  leads  us  to  a 
determination  of  the  probability  of  an  actual  event  of  g,  then, 
from  an  empirical  point  of  view  also,  this  determination  will 
automatically  fail  to  be  contradictory. 

§  3.   Examples  of  Infinite  Fields  of  Probability 

I.  In  §  1  of  the  first  chapter,  we  have  constructed  various 
finite  probability  fields. 

Let  now  E  =  {£x ,  £2  >  •  •  •>  ln»  •  ■  •}  be  a  countable  set,  and  let  5 
coincide  with  the  aggregate  of  the  subsets  of  E. 

All  possible  probability  fields  with  such  an  aggregate  5  are 
obtained  in  the  following  manner: 

We  take  a  sequence  of  non-negative  numbers  p„,  such  that 

Pi  +  Vi  +  .  .  .  +  Vn  +  •  •  •  =  1 

and  for  each  set  A  put 

P(A)    -  2'fin, 
n 

where  the  summation  2'  extends  to  all  the  indices  n  for  which 
$n  belongs  to  A.  These  fields  of  probability  are  obviously  Borel 
fields. 

II.  In  this  example,  we  shall  assume  that  E  represents  the 
real  number  axis.  At  first,  let  g  be  formed  of  all  possible  finite 
sums  of  half -open  intervals  [a;  b)  —  {a£.tj<b}  (taking  into 
consideration  not  only  the  proper  intervals,  with  finite  a  and  b, 
but  also  the  improper  intervals  [-  <x>;  a),  [a,-  +  oo)  and  [-o©j 
4-  oo  ) ) .  g  is  then  a  field.  By  means  of  the  extension  theorem,  how- 
ever, each  field  of  probability  on  5  can  be  extended  to  a  similar 
field  on  B%.  The  system  of  sets  B%  is,  therefore,  in  our  case 
nothing  but  the  system  of  all  Borel  point  sets  on  a  line.  Let  us 
turn  now  to  the  following  case. 

III.  Again  suppose  E  to  be  the  real  number  axis,  while  g  is 
composed  of  all  Borel  point  sets  of  this  line.  In  order  to  construct 
a  field  of  probability  with  the  given  field  gf,  it  is  sufficient  to 
define  an  arbitrary  non-negative  completely  additive  set-function 
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P(A)  on  3  which  satisfies  the  condition  P(E)  =  1.  As  is  well 
known3,  such  a  function  is  uniquely  determined  by  its  values 

P[-oo;x)  =F(x)  (1) 

for  the  special  intervals  [-<*>;  x) .  The  function  F(x)  is  called  the 
distribution  function  of  £.  Further  on  (Chap.  Ill,  §  2)  we  shall 
shown  that  F(x)  is  non-decreasing,  continuous  on  the  left,  and 
has  the  following  limiting  values : 

lim  F(x)  =  i^-oc)  =  6,        lim  F(x)  =  F(  +  oo)  =  1    .         (2) 

*  — ►  —  oo  a;  ->  -»-  oo 

Conversely,  if  a  given  function  F(x)  satisfies  these  conditions, 
then  it  always  determines  a  non-negative  completely  additive  set- 
function  P(A)  for  which  P(E)  =  l4. 

IV.  Let  us  now  consider  the  basic  set  E  as  an  n-dimensional 
Euclidian  space  Rn,  i.e.,  the  set  of  all  ordered  n-tuples  £  =  {xu  x2, 
.  .  .  ,  xnj  of  real  numbers.  Let  $  consist,  in  this  case,  of  all  Borel 
point-sets5  of  the  space  Rn.  On  the  basis  of  reasoning  analogous 
to  that  used  in  Example  II,  we  need  not  investigate  narrower  sys- 
tems of  sets,  for  example  the  systems  of  n-dimensional  intervals. 

The  role  of  probability  function  P(A)  will  be  played  here, 
as  always,  by  any  non-negative  and  completely  additive  set- 
function  defined  on  $  and  satisfying  the  condition  P(E)  =1.  Such 
a  set-function  is  determined  uniquely  if  we  assign  its  values 

P{Laiai...an)  =F{alta2,...,an)  (3) 

for  the  special  sets  Laia%„,an  ,  where  Laia,...an  represents  the 
aggregate  of  all  £  for  which  Xi<Oi  (i  =  1,  2,  .  . .  ,  n). 

For  our  function  F (alf  a2, . . . ,  an)  we  may  choose  any  function 
which  for  each  variable  is  non-decreasing  and  continuous  on  the 
left,  and  which  satisfies  the  following  conditions : 

lim    F(ava2>  ...,«„)  =  F(av  . .  .,«i_i,  —  oo,ai+1,  ...,#„)  =0, 
"—~  t  =  4,  2i ....,»  f 

lim    F(ava2,..  .,an)  =F(+oo,  +00,  ...,  -foo)  =  1. 

Oi  ->  +00,  Oj  ->  +00, ...,  o»  — ►  -t-00 

F(au  a2, . . . ,  an)  is  called  the  distribution  function  of  the  vari- 
ables a?i,  x2, . . . ,  xn. 


3  Cf .,  for  example,  Lebesgue,  Legons  sur  V integration,  1928,  p.  152-156. 
*  See  the  previous  note. 

8  For  a  definition  of  Borel  sets  in  Rn  see  Hausdorff,  Mengenlehre,  1927, 
pp.  177-181. 
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The  investigation  of  fields  of  probability  of  the  above  type 
is  sufficient  for  all  classical  problems  in  the  theory  of  probability6. 
In  particular,  a  probability  function  in  Rn  can  be  defined  thus: 
We  take  any  non-negative  point  function  f(xu  x2,  .  .  .  ,  xn) 
defined  in  Rn,  such  that 
+00    +00    +90 
j       j ...j f(xltx2,  .  .  .,xn)dx1dx2  .  .  .  dxn=\ 


—00     —00 


and  set 

P(A)  =  //•••  ff(xi>x2>  ••  .,xn)dx1dx2  ...  dxn   .        (5) 

A 

f(xu  x2,  .  .  .  ,  xn)  is,  in  this  case,  the  probability  density  at  the 
point  (xu  x2,  .  .  .  ,  xn)  (cf.  Chap.  Ill,  §  2). 

Another  type  of  probability  function  in  Rn  is  obtained  in  the 
following  manner:  Let  {£.}  be  a  sequence  of  points  of  Rn,  and 
let  {pi}  be  a  sequence  of  non-negative  real  numbers,  such  that 
£pi  =  1 ;  we  then  set,  as  we  did  in  Example  I, 

P(A)  =Z'Vi, 

where  the  summation  2'  extends  over  all  indices  i  for  which  £ 
belongs  to  A.  The  two  types  of  probability  functions  in  Rn  men- 
tioned here  do  not  exhaust  all  possibilities,  but  are  usually  con- 
sidered sufficient  for  applications  of  the  theory  of  probability. 
Nevertheless,  we  can  imagine  problems  of  interest  for  applica- 
tions outside  of  this  classical  region  in  which  elementary  events 
are  defined  by  means  of  an  infinite  number  of  coordinates.  The 
corresponding  fields  of  probability  we  shall  study  more  closely 
after  introducing  several  concepts  needed  for  this  purpose.  (Cf. 
Chap.  Ill,  §3). 


6  Cf.,  for  example,  R.  v.  Mises  [1],  pp.  13-19.  Here  the  existence  of  proba- 
bilities for  "all  practically  possible"  sets  of  an  n-dimensional  space  is 
required. 
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RANDOM  VARIABLES 

§  1.   Probability  Functions 

Given  a  mapping  of  the  set  E  into  a  set  E'  consisting  of  any- 
type  of  elements,  i.e.,  a  single-valued  function  u(£)  defined  on  E, 
whose  values  belong  to  E'.  To  each  subset  A'  of  E'  we  shall  put 
into  correspondence,  as  its  pre-image  in  E,  the  set  u-x(A')  of  all 
elements  of  E  which  map  onto  elements  of  A'.  Let  %(u)  be  the 
system  of  all  subsets  A'  of  E',  whose  pre-images  belong  to  the 
field  g.  %(u)  will  then  also  be  a  field.  If  5  happens  to  be  a  Borel 
field,  the  same  will  be  true  of  5(m)-  We  now  set 

poo(A')  =  P    K1^')}.  (1) 

Since  this  set-function  P(m),  defined  on  5(M\  satisfies  with  respect 
to  the  field  5(m)  all  of  our  Axioms  I  -  VI,  it  represents  a  proba- 
bility function  on  %(u).  Before  turning  to  the  proof  of  all  the  facts 
just  stated,  we  shall  formulate  the  following  definition. 

Definition.  Given  a  single- valued  function  u(£)  of  a  random 
event  £.  The  function  P(M>(A'),  defined  by  (1),  is  then  called  the 
probability  function  of  u. 

Remark  1 :  In  studying  fields  of  probability  (5,  P) ,  we  call  the 
function  P(A)  simply  the  probability  function,  but  P^(A')  is 
called  the  probability  function  of  u.  In  the  case  u($)  =  £,  P(m)  (A') 
coincides  with  P(A). 

Remark  2:  The  event  vrx(A')  consists  of  the  fact  that  u(£) 
belongs  to  A'.  Therefore,  P(m)  (A')  is  the  probability  of  u(£)  c  A'. 

We  still  have  to  prove  the  above-mentioned  properties  of  %(u) 
and  P(M>.  They  follow,  however,  from  a  single  fact,  namely: 

Lemma.  The  sum,  product,  and  difference  of  any  pre-image 
sets  w-1(A')  are  the  pre-images  of  the  corresponding  sums,  prod- 
ucts, and  differences  of  the  original  sets  A'. 

The  proof  of  this  lemma  is  left  for  the  reader. 
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Let  A'  and  B'  be  two  sets  of  $(M>.  Their  pre-images  A  and  B 
belong  then  to  J.  Since  %  is  a  field,  the  sets  AB,  A  +  B,  and  A  -  B 
also  belong  to  g ;  but  these  sets  are  the  pre-images  of  the  sets  A'B\ 
A'  +  B\  and  A'  -B',  which  thus  belong  to  ^u\  This  proves  that 
5(u)  is  a  field.  In  the  same  manner  it  can  be  shown  that  if  g  is  a 
Borel  field,  so  is  %(u\ 

Furthermore,  it  is  clear  that 

PM(E')  =  P^-1^)}    =  P(#)  =  1. 
That  PU)  is  always  non-negative,  is  self-evident.  It  remains  only 
to  be  shown,  therefore,  that  P(m)  is  completely  additive    (cf.  the 
end  of  §  1,  Chap.  II). 

Let  us  assume  that  the  sets  A'n,  and  therefore  their  pre-images 
u-1(A\)}a,Ye  disjoint.  It  follows  that 

n  n  n 

n  n 

which  proves  the  complete  additivity  of  Pu). 

In  conclusion  let  us  also  note  the  following.  Let  ux(g)  be  a 
function  mapping  E  on  E',  and  u2(t)  be  another  function,  map- 
ping £"  on  E".  The  product  function  u2uA£)  maps  E  on  E" .  We 
shall  now  study  the  probability  functions  P(Ml)(A')  and  P(uHA") 
for  the  functions  urU)  and  u(()  =  UzUiU).  It  is  easy  to  show 
that  these  two  probability  functions  are  connected  by  the  follow- 
ing relation: 

?^(A,f)^?^){u^(Aff)}.  (2) 

§  2.   Definition  of  Random  Variables  and  of 
Distribution  Functions 

Definition.  A  real  single- valued  function  *(£),  defined  on  the 
basic  set  E,  is  called  a  random  variable  if  for  each  choice  of  a  real 
number  a  the  set  {x  <  a}  of  all  |  for  which  the  inequality  x  <  a 
holds  true,  belongs  to  the  system  of  sets  $• 

This  function  x(£)  maps  the  basic  set  E  into  the  set  R1  of  all 
real  numbers.  This  function  determines,  as  in  §  1,  a  field  %(x)  of 
subsets  of  the  set  R1.  We  may  formulate  our  definition  of  random 
variable  in  this  manner :  A  real  function  x  (£)  is  a  random  variable 
if  and  only  if  gU)  contains  every  interval  of  the  form  (-ooj  a) . 
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Since  g(*>  is  a  field,  then  along  with  the  intervals  (-oo,«  a)  it 
contains  all  possible  finite  sums  of  half -open  intervals  [a,-  b).  If 
our  field  of  probability  is  a  Borel  field,  then  $  and  5U)  are  Borel 
fields ;  therefore,  in  this  case  %(x)  contains  all  Borel  sets  of  R1, 

The  probability  function  of  a  random  variable  we  shall  denote 
in  the  future  by  P<*>  (A') .  It  is  defined  for  all  sets  of  the  field  ft<*>. 
In  particular,  for  the  most  important  case,  the  Borel  field  of 
probability,  P(x)  is  defined  for  all  Borel  sets  of  R1. 

Definition.  The  function 

F<*Ha)  =P<*>  (-*>',  a)  =p   {x<a}, 

where  -  oo  and  4-  oo  are  allowable  values  of  a,  is  called  the  distri- 
bution function  of  the  random  variable  x. 
From  the  definition  it  follows  at  once  that 

FW(-oo)  =0,  FW(  +  oo)   =  1     .  (1) 

The  probability  of  the  realization  of  both  inequalities  a^x<b, 
is  obviously  given  by  the  formula 

?{x  c  [a;  b)}  =  F&{b)  -  F&(a)  (2) 

From  this,  we  have,  for  a  <  b, 

FW(a)§FW(5) 

which  means  that  F(x)  (a)  is  a  non-decreasing  function.  Now  let 
fli  <  a2  <  . . .  <  an  <  . . .  <  b ;  then 

^{xa[an;b)}  =  0 

n 

Therefore,  in  accordance  with  the  continuity  axiom, 

FV(b)-F(*)(an)  =  P{xcz[an>b)} 

approaches  zero  as«->  +  oo.  From  this  it  is  clear  that  F(x)(a)  is 
continuous  on  the  left. 

In  an  analogous  way  we  can  prove  the  formulae: 

lim  FW  (a)  =  FW  (.-  oo )  =  0,  a  -+  -  oo ,     (3) 

lim  FW  (a)  =  F«  (  +  oo  )  =  1,  a  -►  +  oo-   (4) 

If  the  field  of  probability  (5,  P)  is  a  Borel  field,  the  values  of 
the  probability  function  P<*>(A)  for  all  Borel  sets  A  of  i^1  are 
uniquely  determined  by  knowledge  of  the  distribution  function 
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F(x)(a)  (cf.  §  3,  III  in  Chap.  II).  Since  our  main  interest  lies  in 
these  values  of  P(x)(A),  the  distribution  function  plays  a  most 
significant  role  in  all  our  future  work. 

If  the  distribution  function  F(x)  (a)  is  differentiate,  then  we 
call  its  derivative  with  respect  to  a, 

the  probability  density  of  x  at  the  point  a. 

a 

If  also  F(x)  (a)  =  j  fix)  (a)  da  for  each  a,  then  we  may  ex- 

—  oo 

press  the  probability  function  ?(x)(A)  for  each  Borel  set  A  in 
terms  of  f(x) (a)  in  the  following  manner: 

Pto(A)=ff(*){a)da.  (5) 

A 

In  this  case  we  call  the  distribution  of  x  continuous.  And  in  the 
general  case,  we  write,  analogously 

PW(A)-  =  fdFW\a).  (6) 

A 

All  the  concepts  just  introduced  are  capable  of  generalization 
for  conditional  probabilities.  The  set  function 

9%\A)  =  ?B(xc:A) 

is  the  conditional  probability  function  of  x  under  hypothesis  B. 
The  non-decreasing  function 

Ff(a)  =  PB(x<a) 

is  the  corresponding  distribution  function,  and,  finally  (in  the 
case  where  F^(a)  is  differentiate ) 

*?(*)  =  j;*VM 

is  the  conditional  probability  density  of  x  at  the  point  a  under 
hypothesis  B. 

§  3.   Multi-dimensional  Distribution  Functions 

Let  now  n  random  variables  xlt  x2,  .  .  .  ,  xn  be  given.  The  point 
x  =  (xu  x2,  .  .  .  ,  Xn)  of  the  7i-dimensional  space  Rn  is  a  function 
of  the  elementary  event  £.  Therefore,  according  to  the  general 
rules  in  §1,  we  have  a  field        «j(*i;  *.■••.*■>  consisting  of 
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subsets  of  space  Rn  and  a  probability  function  pfe»»  *»•••»■•*>  (4') 
defined  on  gf'.  This  probability  function  is  called  the  n-dimensional 
probability  function  of  the  random  vat  iables  xlt  x2,  .  .  .  ,  xn. 

As  follows  directly  from  the  definition  of  a  random  variable, 
the  field  g'  contains,  for  each  choice  of  i  and  at  (i  =  1,  2,  .  .  .  ,  n)f 
the  set  of  all  points  in  Rn  for  which  x{  <  a{.  Therefore  g'  also  con- 
tains the  intersection  of  the  above  sets,  i.e.  the  set  Lai0t_aH 
of  all  points  of  Rn  for  which  all  the  inequalities  x{  <  at  hold 
(i  =  l,2,...,n)\ 

If  we  now  denote  as  the  n-dimensional  half -open  interval 

[tti,  a2, . . . ,  an ',  Oi,  b2, . . . ,  on)  ; 

the  set  of  all  points  in  Rn,  for  which  ai^^i<bi,  then  we  see  at 
once  that  each  such  interval  belongs  to  the  field  gf'  since 

[av  at,  ...,an;     bv  b2,  . . .,  bn) 

==  ^b\  bt .. .  bn         *^o,\  bt  . . .  bn         ^b\  atbi  ...  bn         *     *  ^bx  b% ...  bn-i  dn  ' 

The  Borel  extension  of  the  system  of  all  n-dimensional  half- 
open  intervals  consists  of  all  Borel  sets  in  Rn.  From  this  it  follows 
that  in  the  case  of  a  Borel  field  of  probability '7the  field  5  contains 
all  the  Borel  sets  in  the  space  Rn. 

THEOREM  :  In  the  case  of  a  Borel  field  of  probability  each  Borel 
function  x  =  f(xlt  x2,  .  .  .  ,  xn)  of  a  finite  number  of  random  vari- 
ables xu  x2,  .  .  .  ,  xn  is  also  a  random  variable. 

All  we  need  to  prove  this  is  to  point  out  that  the  set  of  all 
points  (xlt  x2,  .  .  .  ,  xn)  in  Rn  for  which  x  =  f(xu  %2, . . . ,  xn)  <a, 
is  a  Borel  set.  In  particular,  all  finite  sums  and  products  of  random 
variables  are  also  random  variables. 

Definition  :  The  function 

is  called  the  w-dimensional  distribution  function  of  the  random 
variables  xlf  x2f  .  .  .  ,  xn. 

As  in  the  one-dimensional  case,  we  prove  that  the  n-dimensional 
distribution  function  F(Xl'x Xn)(au  a2f .  . .  ,  an)  is  non-decreas- 
ing and  continuous  on  the  left  in  each  variable.  In  analogy  to 
equations  (3)  and  (4)  in  §  2,  we  here  have 


1  The  af  may  also   assume  the  infinite  values  ±  <*> 
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limF(«lf  a2,  .  .  .,  an)  =  F(av  .  .  .,  «,_lf  -oo,  ai+1,  .  .  .,  an)  =  0,     (7) 
limyfo,  a,,  . . .,  an)  =  F(+<x>,  +<x>,  . . .,  +  oo)  =  1.  (8) 

O,  — ►  +  00,  at  — ►  +oo. . ..,  aM  ->  +oo 

The  distribution  function  F<x*x*  ■•  ■*•»)  gives  directly  the  values 
of  P(Xl'  *2 Xh)  only  for  the  special  sets  Lfli  a, . . .  a„ .  If  our  field,  how- 
ever, is  a  Borel  field,  then2  ?<*"* >*»)  is  uniquely  determined  for 

all  Borel  sets  in  Rn  by  knowledge  of  the  distribution  function 

If  there  exists  the  derivative 

we  call  this  derivative  the  n-dimensional  probability  density  of 
the  random  variables  xu  x2,  .  .  .  ,  xn  at  the  point  au  a2r  .  .  ,  a„.  If 
also  for  every  point  (a11  a2,  .  .  .  ,  an) 

p(xux*.  ...,*„>  (axa2  .  .  .  an)  =|     f  ...jf{alta2 an)da,da2  . . .  dan, 

—  OO    — oo       — oo 

then  the  distribution  of  xlf  x2,  .  .  .  ,  se»  is  called  continuous.  For 
every  Borel  set  Ac  #M,  we  have  the  equality 

pfeu.... ..,«.)  (4)  -=yj.  .  .jf(ava%t  . . .,  flji^rffl,  .  •  •  <**„.        (9) 

4 

In  closing  this  section  we  shall  make  one  more  remark  about 
the  relationships  between  the  various  probability  functions  and 
distribution  functions. 

Given  the  substitution 


s/i.     2,    ....    n\ 


and  let  ^denote  the  transformation 

*i  =  xik         (k  =  1,2,  ...,n) 
of  space  i?w  into  itself.  It  is  then  obvious  that 

pfrv*^.  ••-,*»,)  (4)  =  p(*i,  *.,...,  «w  {r-i^)}.  (10) 

Now  let  x'  =  Pk(x)  be  the  "projection"  of  the  space  Rn  on  the 
space  Rk  (k<n),  so  that  the  point  (xlf  x2, .  . . ,  xn)  is  mappedonto 
the  point  (xu  x2t . . . ,  ^fc) .  Then,  as  a  result  of  Formula  (2)  in  §  1, 


Cf .  §  3,  IV  in  the  Second  Chapter. 
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p<*.,*a,...,**>(,4)  =  pttk.*.....-^^-!^)}.  (ii) 

For  the  corresponding  distribution  functions,  we  obtain  from 
(10)  and  (11)  the  equations : 

/#*.•*«.• —"Ufo,  aia, . . .,  ain)  =  F<*»**<  ••->^(a1,a2 an) ,        (12) 

pin,**. ...,**) (alfa2t. ..,ak)  =  Fx» •«•■•••*«> (ax,  ...,aft,+oo,...,+oo).(13) 

§  4.   Probabilities  in  Infinite-dimensional  Spaces 

In  §  3  of  the  second  chapter  we  have  seen  how  to  construct 
various  fields  of  probability  common  in  the  theory  of  probability. 
We  can  imagine,  however,  interesting  problems  in  which  the 
elementary  events  are  defined  by  means  of  an  infinite  number 
of  coordinates.  Let  us  take  a  set  M  of  indices  /*  (indexing  set)  of 
arbitrary  cardinality  m .  The  totality  of  all  systems 

of  real  numbers  xM ,  where  /x  runs  through  the  entire  set  M,  we 
shall  call  the  space  RM  (in  order  to  define  an  element  £  in  space 
RM,  we  must  put  each  element  /x  in  set  M  in  correspondence  with 
a  real  number  %  or,  equivalently,  assign  a  real  single-valued 
function  x^  of  the  element  /*,  defined  on  M) 3.  If  the  set  M  consists 
of  the  first  n  natural  numbers  1,  2, . . . ,  n,  then  RM  is  the  ordinary 
7i-dimensional  space  Rn.  If  we  choose  for  the  set  M  all  real  num- 
bers R1,  then  the  corresponding  space  RM  =  RR1  will  consist  of 
all  real  functions 

((/*)  =  xtt 

of  the  real  variable  /*. 

We  now  take  the  set  RM  (with  an  arbitrary  set  M)  as  the 
basic  set  E.  Let  I  =  {x^}  be  an  element  in  E;  we  shall  denote  by 
ft* a... >»:(£)  ^ne  Point  {x/tl,xiH9..-.txfh)'  of  the  n-dimensional 
space  Rn.  A  subset  A  of  E  we  shall  call  a  cylinder  set  if  it  can 
be  represented  in  the  form 

where  A'  is  a  subset  of  #w.  The  class  of  all  cylinder  sets  coincides, 
therefore,  with  the  class  of  all  sets  which  can  be  defined  by  rela- 
tions of  the  form 


3  Cf.  Hausdorff,  Mengenlehre,  1927,  p.  23. 
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/(**.**.- ••»**,)=-<)     .  (1) 

In  order  to  determine  an  arbitrary  cylinder  set  PMl  ^ . . .  ^  (A ')  by 
such  a  relation,  we  need  only  take  as  /  a  function  which  equals  0 
on  A',  but  outside  of  A'  equals  unity. 

A  cylinder  set  is  a  Borel  cylinder  set  if  the  corresponding  set 
Af  is  a  Borel  set.  All  Borel  cylinder  sets  of  the  space  RM  form  a 
field,  which  we  shall  henceforth  denote  by  gM4. 

The  Borel  extension  of  the  field  %M  we  shall  denote,  as  always, 
by  B%M.  Sets  in  B%M  we  shall  call  Borel  sets  of  the  space  RM. 

Later  on  we  shall  give  a  method  of  constructing  and  operating 
with  probability  functions  on  %M,  and  consequently,  by  means  of 
the  Extension  Theorem,  on  B%M  also.  We  obtain  in  this  manner 
fields  of  probability  sufficient  for  all  purposes  in  the  case  that  the 
set  M  is  denumerable.  We  can  therefore  handle  all  questions 
touching  upon  a  denumerable  sequence  of  random  variables.  But 
if  M  is  not  denumerable,  many  simple  and  interesting  subsets  of 
RM  remain  outside  of  B%M.  For  example,  the  set  of  all  elements  £ 
for  which  *M  remains  smaller  than  a  fixed  constant  for  all 
indices  /*,  does  not  belong  to  the  system  B%M  if  the  set  M  is 
non-denumerable. 

It  is  therefore  desirable  to  try  whenever  possible  to  put  each 
problem  in  such  a  form  that  the  space  of  all  elementary  events  £ 
has  only  a  denumerable  set  of  coordinates. 

Let  a  probability  function  P(A)  be  defined  on  %M.  We  may 
then  regard  every  coordinate  %M  of  the  elementary  event  £ 
as  a  random  variable.  In  consequence,  every  finite  group 
(xrii>  xm»>  -  •  •*  xfJ  °f  these  coordinates  has  an  ^-dimensional 
probability  function    P^....^^)    and  a  corresponding  distribu- 


4  From  the  above  it  follows  that  Borel  cylinder  sets  are  Borel  sets  definable 
by  relations  of  type  ( 1 ) .  Now  let  A  and  B  be  two  Borel  cylinder  sets  defined 
by  the  relations 

/(*/*i.  *t*t *#«J  =  0»         Sfai.  *l XU)  =  0   • 

Then  we  can  define  the  sets  A  +  B,  AB,  and  A-B  respectively  by  the  relations 

f-g  =  0, 
f*  +  g2  =  0, 

where  a>  (x)  =  0  f or  x  4=  0  and  w  (0)  =  1  If  /  and  g  are  Borel  functions,  so 
also  are  f-g,  f  +  g2  and  f  +  <o{g)  ;  therefore,  A  +  B,  AB  and  A-B  are  Borel 
cylinder  sets.  Thus  we  have  shown  that  the  system  of  sets     $3f    is  a  field. 
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tion  function  ^^...^(fli,  a2,  .  .  .  ,  aw).  It  is  obvious  that  for 
every  Borel  cylinder  set 

the  following  equation  holds: 

p^  =  pw,...,.w, 

where  A'  is  a  Borel  set  of  /?".  In  this  manner,  the  probability 
function  P  is  uniquely  determined  on  the  field  %M  of  all  cylinder  sets 
by  means  of  the  values  of  all  finite  probability  functions  P^^ . . .  ^ 
for  all  Borel  sets  of  the  corresponding  spaces  Rn.  However,  for 
Borel  sets,  the  values  of  the  probability  functions  P^,...^  are 
uniquely  determined  by  means  of  the  corresponding  distribution 
functions.  We  have  thus  proved  the  following  theorem : 

T.he  set  of  all  finite-dimensional  distribution  functions 
F/hih  —  i1*  uniquely  determines  the  probability  function  P(A)  for 
all  sets  in  $M.  If  P(A)  is  defined  on  %M,  then  (according  to  the 
extension  theorem)  it  is  uniquely  determined  on  B%M  by  the 
values  of  the  distribution  f  unctions  F^^...^  . 

We  may  now  ask  the  following.  Under  what  conditions  does  a 
system  of  distribution  functions  F^^,,.^  given  a  priori  define 
a  field  of  probability  on  %M  (and,  consequently,  on  B%M)  ? 

We  must  first  note  that  every  distribution  function  F^/h.../** 
must  satisfy  the  conditions  given  in  §  3,  III  of  the  second  chap- 
ter; indeed  this  is  contained  in  the  very  concept  of  distribution 
function.  Besides,  as  a  result  of  formulas  (13)  and  (14)  in  §2, 
we  have  also  the  following  relations : 

FfHifHt...Hn{ail,  ait,  . . .,  ain)  =  F/<l/<2.../tttK,  a2,  . . .,  an)  ,  (2) 

*V*...**(«i.  a2>  -■■>  ak)  =^W,...^K,  «2.  ...,**,+<»,...,  +oo),(3) 
where  k  <  n  and     [/  /    "'  n)    is  an  arbitrary  permutation. 

\*1>    *2»    •  •  •»    W 

These  necessary  conditions  prove  also  to  be  sufficient,  as  will 
appear  from  the  following  theorem. 

Fundamental  Theorem:  Every  system  of  distribution  func- 
tions FfllHM...pH,  satisfying  the  conditions  (2)  and  (3),  defines  a 
probability  function  P(A)  on  %M,  which  satisfies  Axioms  I  -  VI. 
This  probability  function  P(A)  can  be  extended  (by  the  exten- 
sion theorem)  to  B%M  also. 
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Proof.  Given  the  distribution  functions  ^1/ut.../.B,  satisfying 
the  general  conditions  of  Chap.  II,  §  3,  III  and  also  conditions  (2) 
and  (3).  Every  distribution  function  &&&...  p.  defines  uniquely 
a  corresponding  probability  function  P^^,...^  for  all  Borel  sets 
of  Rn  (cf.  §  3).  We  shall  deal  in  the  future  only  with  Borel  sets 
of  Rn  and  with  Borel  cylinder  sets  in  E. 

For  every  cylinder  set 

we  set 

PW  =  P*,*,...,^V).  (4) 

Since  the  same  cylinder  set  A  can  be  denned  by  various  sets  A', 
we  must  first  show  that  formula  (4)  yields  always  the  same 
value  for  P(A). 

Let  (x^,  x^  ...,  XpJ  be  a  finite  system  of  random  variables 
Xp.  Proceeding  from  the  probability  function  P^^,...^  of  these 
random  variables,  we  can,  in  accordance  with  the  rules  in  §  3, 
define  the  probability  function  P^^...^  of  each  subsystem 
(xHi,  xH  , . . .,  x/H  )  .  From  equations  (2)  and  (3)  it  follows  that 
this  probability  function  defined  according  to  §  3  is  the  same  as 
the  function  P^^2 . . .  Hlt  given  a  priori.  We  shall  now  suppose  that 
the  cylinder  set  A  is  defined  by  means  of 

A=p;l„it...Hy) 

and  simultaneously  by  means  of 

where  all  random  variables  xM  and  *  belong  to  the  system 
(x/*i >  xht  >  •  •  • »  *«J  »  which  is  obviously  not  an  essential  restriction. 
The  conditions 

and 

(V  ,  V  ,  ...,*«  )cA" 

are  equivalent.  Therefore 

P^\ H%  •  •  •  Hk  (A ')  =  P^«  ■  •  •  n*  {(^ »  */4, •  * ' ' >  XHk)  c  ^'} 

=  P^,...^{(*>V  X'  '  •  "  **J  c  Al  =  %^'^JA^  > 
which  proves  our  statement  concerning  the  uniqueness  of  the 
definition  of  P(A). 
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Let  us  now  prove  that  the  field  of  probability  (JP,  P)  satisfies 
all  the  Axioms  I  -  VI.  Axiom  I  requires  merely  that  gM  be  a  field. 
This  fact  has  already  been  proven  above.  Moreover,  for  an  arbi- 
trary /x  : 

P(E)  =  Pfl(R*)  =  i, 

which  proves  that  Axioms  II  and  IV  apply  in  this  case.  Finally, 
from  the  definition  of  P(A)  it  follows  at  once  that  P(A)  is  non- 
negative  (Axiom  III). 

It  is  only  slightly  more  complicated  to  prove  that  Axiom  V 
is  also  satisfied.  In  order  to  do  so,  we  investigate  two  cylinder  sets 

and  B-«iV-*.<*>. 

We  shall  assume  that  all  variables  xh.  and  xN  belong  to  one  inclu- 
sive finite  system  (x^,  x^, . . .,  x„n)  .  If  the  sets  A  and  B  do  not 
intersect,  the  relations 

[*/%'*/%'  -'"x/Hk)(=:A 

are  incompatible.  Therefore 

?{A  +  B)  =  P**.;.*^,  xHi,  . . .,  *„.J  c:  A' 
or       (VS'-'SJ^J 

=    P^,  fi2  .  •  •  ftn  {  (^i1 »  ^i,  »    *  '  *  '    **fe)  C  ^    } 

+  P^^...^{(^. ,  *„v  •  • .,  *„,J  c B'}  =  P(^)  +  P(B) , 

which  concludes  our  proof. 
Only  Axiom  VI  remains.  Let 

A1  =>  A2  3  •••  id  i4w  z>  ••• 

be  a  decreasing  sequence  of  cylinder  sets  satisfying  the  condition 

lim  P(An)  =L>0.      • 

We  shall  prove  that  the  product  of  all  sets  An  is  not  empty.  We 
may  assume,  without  essentially  restricting  the  problem,  that  in 
the  definition  of  the  first  n  cylinder  sets  Ak,  only  the  first  n  co- 
ordinates Xpk  in  the  sequence 
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occur,  i.e. 

^  =  ^,. ..,.»(£»)• 

For  brevity  we  set 

^,t...Mn(B)  =  Pn(B); 
then,  obviously 

Pn(Bn)  =?(An)  ^L>0. 

In  each  set  Bn  it  is  possible  to  find  a  closed  bounded  set  Un  such 
that 

P»(Bn-Un)^-^. 

From  this  inequality  we  have  for  the  set 

the  inequality 
Let,  morever, 


"  r  1*1  ft  •  •  •  f*H  V        " 


P(An-Vn)^J-.  (5) 


wn  =  vxv2 . . .  vn. 

From  (5)  it  follows  that 

P(An-Wn)  g€. 

Since    WncVnc:An  ,  it  follows  that 

P(Wn)^P(An)-e^L-8. 

If  e  is  sufficiently  small,  P(Wn)  >  0  and  Wn  is  not  empty.  We 
shall  now  choose  in  each  set  Wn  a  point  £U)  with  the  coordinates 
a»  Every  point  ^M+^),  p  =  0,  1,  2,  .  .  .  ,  belongs  to  the  set  Vn; 
therefore 

(*rp).  *;rp) *<n.+») = ^....,.(f<»^»)  c  t/„ . 

Since  the  sets  Un  are  bounded  we  may  (by  the  diagonal  method) 
choose  from  the  sequence   {£(n)}    a  subsequence 

for  which  the  corresponding  coordinates  *2?  tend  for  any  A:  to 
a  definite  limit  xk.  Let,  finally,  |  be  a  point  in  set  £7  with  the 
coordinates 

Xt*k  =  xk  > 

x,*  =  0,       /*  +  /**•  £  =  1,2,3,... 
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As  the  limit  of  the  sequence  (x^,  4Wl),  • .  • ,  #iWi)),  i  =  1,  2,  3, .  .  . ,  the 
point  (xlt  x2, . . . ,  £fc)  belongs  to  the  set  Uk.  Therefore,  £  belongs  to 

for  any  k  and  therefore  to  the  product 

k     * 

§  5.   Equivalent  Random  Variables ;  Various  Kinds  of  Convergence 

Starting  with  this  paragraph,  we  deal  exclusively  with  Borel 
fields  of  probability.  As  we  have  already  explained  in  §  2  of  the 
second  chapter,  this  does  not  constitute  any  essential  restriction 
on  our  investigations. 

Two  random  variables  x  and  y  are  called  equivalent,  if  the 
probability  of  the  relation  x  ^=-y  is  equal  to  zero.  It  is  obvious  that 
two  equivalent  random  variables  have  the  same  probability  func- 
tion: 

pu)(A)  =  ?(y)(A). 

Therefore,   the   distribution  functions   F^   and   F-W   are   also 
identical.  In  many  problems  in  the  theory  of  probability  we  may 
substitute  for  any   random   variable   any   equivalent   variable. 
Now  let 

X\,  X%,  .  .  .  ,  Xn,  ...  \L) 

be  a  sequence  of  random  variables.  Let  us  study  the  set  A  of  all 
elementary  events  £  for  which  the  sequence  (1)  converges.  If  we 
denote  by  A(™J  the  sets  of  £  for  which  all  the  following  inequalities 
hold 

K+*-*»|  <^  k  =  \,2,  ...,p 

then  we  obtain  at  once 

A  =  $<§3Mj;   .  (2) 

m  n  p 

According  to  §  3,  the  set  A^  always  belongs  to  the  field  gf.  The 
relation  (2)  shows  that  A,  too,  belongs  to  5-  We  may,  therefore, 
speak  of  the  probability  of  convergence  of  a  sequence  of  random 
variables,  for  it  always  has  a  perfectly  definite  meaning. 

Now  let  the  probability  P(A)  of  the  convergence  set  A  be 
equal  to  unity.  We  may  then  state  that  the  sequence  (1)  con- 
verges with  the  probability  one  to  a  random  variable  x,  where 
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the  random  variable  x  is  uniquely  denned  except  for  equivalence. 
To  determine  such  a  random  variable  we  set 


lim  xn  n 


oo 


on  A,  and  x  —  0  outside  of  A.  We  have  to  show  that  x  is  a  random 
variable,  in  other  words,  that  the  set  A  (a)  of  the  elements  £  for 
which  x  <  a,  belongs  to  5-  But 

A(a)  =  A<S<£>{xn+p<a} 
in  case  a  ^  0,and 

A  (a)  =  ,4©${*n+p<tf}  +  ^" 

n   p 

in  the  opposite  case,  from  which  our  statement  follows  at  once. 
If  the  probability  of  convergence  of  the  sequence  (1)  to  x 
equals  one,  then  we  say  that  the  sequence  (1)  converges  almost 
surely  to  x.  However,  for  the  theory  of  probability,  another  con- 
ception of  convergence  is  possibly  more  important. 

Definition.  The  sequence  xu  x2,  .  .  . ,  xn,  .  . '.'.  of  random  vari- 
ables converges  in  probability  (converge  en  probability)  to  the 
random  variable  x,  if  for  any  £  >  0,  the  probability 

tends  toward  zero  as  n  — ►  oo  5. 

I.  If  the  sequence  (1)  converges  in  probability  to  x  and  also 
to  x',  then  x  and  x'  are  equivalent.  In  fact 

since  the  last  probabilities  are  as  small  as  we  please  for  a  suffici- 
ently large  n  it  follows  that 

p|i*-*'i>y=° 

and  we  obtain  at  once  that 

P{x±X'}^]?P{\x-X'\>lt}  =  0. 

m 

II.  //  the  sequence  (1)  almost  surely  converges  to  x,  then  it 


5  This  concept  is  due  to  Bernoulli ;  its  completely  general  treatment  was 
introduced  by  E.  E.  Slutsky  (see  [1]). 
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also  converges  to  x  in  probability.  Let  A  be  the  convergence  set 
of  the  sequence  (1)  ;  then 

1  =  P(A)^limP{\xn+p-x\<e,p  =  0,i,2,...}^limP{\xn-x\<e}, 

from  which  the  convergence  in  probability  follows. 

III.  For  the  convergence  in  probability  of  the  sequence  (1) 
the  following  condition  is  both  necessary  and  sufficient:  For  any 
£  >  0  there  exists  an  n  such  that,  for  every  p  >  0,  the  following 
inequality  holds: 

P{|*n+p-*n|>£}<£  . 

Let  Fx(a),  Fs(a),  .  .  .  ,  Fn(a),  .  .  .  ,  F(a)  be  the  distribution 
functions  of  the  random  variables  xlt  %2,  ...,£«,...-,  x.  If  the 
sequence  xn  converges  in  probability  to  x,  the  distribution  func- 
tion F(a)  is  uniquely  determined  by  knowledge  of  the  functions 
Fn(a).  We  have,  in  fact, 

THEOREM :  //  the  sequence  xlt  x2,  .  .  .  ,  xn,  .  .  .  converges  in 
probability  to  x,  the  corresponding  sequence  of  distribution  func- 
tions Fn(a)  converges  at  each  point  of  continuity  of  F(a)  to  the 
distribution  function  F(a)  of  x. 

That  F(a)  is  really  determined  by  the  Fn(a)  follows  from  the 

fact  that  F  (a) ,  being  a  monotone  function,  continuous  on  the  left, 

is  uniquely  determined  by  its  values  at  the  points  of  continuity6.  To 

prove  the  theorem  we  assume  that  F  is  continuous  at  the  point 

a.  Let  a'  <  a ;  then  in  case  x  <  a',  xn==^a  it  is  necessary  that 

\  xn-x  \  >  a  -  a'.  Therefore 

lim P (x < a,  xn ^ a)  =  0 , 

F(a')  =  P{x<a')^P{xn<a)  +  P(x<a\xn^a)=Fn(a)  +  P{x<a',xn^a), 

F  (a')  ^  lim  inf  Fn  (a)  +  lim  P  (x  <  a,  xn  ^  a) , 

F(a')^\immiFn(a).  (3) 

In  an  analogous  manner,  we  can  prove  that  from  a"  >  a  there 
follows  the  relation 

F(a")  ^limsupFc(a).  (4) 


8  In  fact,  it  has  at  most  only  a  countable  set  of  discontinuities  (see  Lebesgue, 
Legons  sur  V integration,  1928,  p.  50.  Therefore,  the  points  of  continuity  are 
everywhere  dense,  and  the  value  of  the  function  F(a)  at  a  point  of  discon- 
tinuity is  determined  as  the  limit  of  its  values  at  the  points  of  continuity 
on  its  left. 
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Since  F(a')  and  F(a")  converge  to  F(a)  for  a'  — *  a  and  a"  — ►  a, 
it  follows  from  (3)  and  (4)  that 

limFB(a)  =  F(a), 

which  proves  our  theorem. 


Chapter  IV 


MATHEMATICAL  EXPECTATIONS1 

§  1.   Abstract  Lebesgue  Integrals 

Let  #  be  a  random  variable  and  A  a  set  of  gf.  Let  us  form,  for  a 
positive  A,  the  sum 

k=  +00 

S;.  ^^H?{kk^f<  {k+i)Xt(cA}.  (1) 

*  =  -00 

If  this  series  converges  absolutely  for  every  A,  then  as  A  — ►  0,  Sk 
tends  toward  a  definite  limit,  which  is  by  definition  the  integral 


I- 


xP(dE)  .  (2) 

A 

In  this  abstract  form  the  concept  of  an  integral  was  introduced 
by  Frechet2;  it  is  indispensable  for  the  theory  of  probability. 
(The  reader  will  see  in  the  following  paragraphs  that  the  usual 
definition  for  the  conditional  mathematical  expectation  of  the 
variable  x  under  hypothesis  A  coincides  with  the  definition  of 
the  integral  (2)  except  for  a  constant  factor.) 

We  shall  give  here  a  brief  survey  of  the  most  important 
properties  of  the  integrals  of  form  (2) .  The  reader  will  find  their 
proofs  in  every  textbook  on  real  variables,  although  the  proofs 
are  usually  carried  out  only  in  the  case  where  P(A)  is  the  Lebesgue 
measure  of  sets  in  Rn.  The  extension  of  these  proofs  to  the  general 
case  does  not  entail  any  new  mathematical  problem ;  for  the  most 
part  they  remain  word  for  word  the  same. 

I.  If  a  random  variable  x  is  integrable  on  A,  then  it  is  in- 
tegrate on  each  subset  A'  of  A  belonging  to  g. 

II.  If  x  is  integrable  on  A  and  A  is  decomposed  into  no 


1  As  was  stated  in  §  5  of  the  third  chapter,  we  are  considering  in  this,  as  well 
as  in  the  following  chapters,  Borel  fields  of  probability  only. 

2  Frechet,   Sur   Vintegrale   oVune   functionnelle   etendue   a   un   ensemble 
abstrait,  Bull.  Soc.  Math.  France  v.  43,  1915,  p.  248. 
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more  than  a  countable  number  of  non-intersecting  sets  An  of  gf, 

then  r  _ , 

JxPXdE)=£jxP(dE). 

A  n  An 

III.  If  x  is  integrabler|  a;  |  is  also  integrable,  and  in  that  case 

\jxP(dE)\^j\x\P{dE), 

A  A 

IV.  If  in  each  event  |,  the  inequalities  0  ^  y  s^  x  hold,  then 
along  with  x,  y  is  also  integrable3,  and  in  that  case 


JyP(dE)  ^fxP{dE) 


A  A 

V.  If  m  ^  as  g  M  where  m  and  M  are  two  constants,  then 

m  P  (A)  ^jx  P  (dE)  ^  M  P  {A) . 

VI.  If  £  and  y  are  integrable,  and  K  and  L  are  two  real  con- 
stants, then  Kx  +  Ly  is  also  integrable,  and  in  this  case 

j(Kx  +  Ly)  P(dE)  =  KJxP{dE)  +  LJyP(dE) . 

VII.  If  the  series 

]?j\xn\P(dE) 

n  A 

converges,  then  the  series 

Jmmi  Xfi  X 

n 

converges  at  each  point  of  set  A  with  the  exception  of  a  certain 
set  B  for  which  P(B)  —  0.  If  we  set  x  =  0  everywhere  except  on 
A  -  Bt  then 


jxP{dE)=^jxnP(dE). 


n   A 


VIII.  If  x  and  y  are  equivalent   (P{*  4=  y)  ~  0)»  then  ^or 
every  set  A  of  5 

jxP(dE)=jyP(dE).  (3) 


3  It  is  assumed  that  y  is  a  random  variable,  i.e.,  in  the  terminology  of  the 
general  theory  of  integration,  measurable  with  respect  to  %  . 
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IX.  If  (3)  holds  for  every  set  A  of  gf,  then  x  and  y  are 
equivalent. 

From  the  foregoing  definition  of  an  integral  we  also  obtain 
the  following  property,  which  is  not  found  in  the  usual  Lebesgue 
theory. 

X.  Let  Pi  (A)  and  P2(A)  be  two  probability  functions  denned 
on  the  same  field  %,  P  ( A )  =  Px  ( A )  +  P2  ( A  \  and  let  x  be  integrable 
on  A  relative  to  P1  (A)  and  P2  (A) .  Then 

jxP(dE)  =^jxPx(dE)  +  jxP2{dE). 

AAA 

XL  Every  bounded  random  variable  is  integrable. 
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Let  a;  be  a  random  variable.  The  integral 
E(x)  =   JxP(dE) 

E 

is  called  in  the  theory  of  probability  the  mathematical  expectation 
of  the  variable  x.  From  the  properties  III,  IV,  V,  VI,  VII,  VIII, 
XI,  it  follows  that 

I.  |.E(*)|£E(|*|); 
II.  E(y) g  E(x)  if  0  ^  y  ^  x  everywhere; 

III.  inf  (x)  ^  E(x)  ^  sup  (x)  ; 

IV.  E(Kx  +  Ly)  =  KE(x)  4-  LE(y)  ; 

V.   E  (2  xn)  =  2  E  (*n) »        if  the  series  2  E  ( I  *»l )  converges ; 

\   n         I  n  n 

VI.  If  x  and  y  are  equivalent  then 

E(z)  =E(2/). 

VII.  Every  bounded  random  variable  has  a  mathematical 
expectation. 

From  the  definition  of  the  integral,  we  have 

k=  +oo 

E(x)  ==  lim^£raP{&m:^  #  <  (jfe.-f  1)  w} 

&=  — OO 

=  lim^rm{F((^+  l)m)  -  F(£m)}  . 
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The  second  line  is  nothing  more  than  the  usual  definition  of  the 

Stieltjes  integral 

+«> 

jadFW(a)  =  E(*).  (1) 

—00 

Formula  (1)  may  therefore  serve  as  a  definition  of  the  mathe- 
matical expectation  E(x). 

Now  let  u  be  a  function  of  the  elementary  event  £,  and  a;  be  a 
random  variable  defined  as  a  single- valued  function  x  —  x(u) 
of  u.  Then 

P{km^x<  (k  +  1)  m}  =  PW{kfn^  x(u)  <  (k  +  \)m}, 

where  P(m)(A)  is  the  probability  function  of  u.  It  then  follows 
from  the  definition  of  the  integral  that 

E  £(u) 

and,  therefore, 

E(x)  =Jx{u)PM(dE(«))  (2) 

where  E(u)  denotes  the  set  of  all  possible  values  of  u. 

In  particular,  when  u  itself  is  a  random  variable  we  have 

+00 
E(x)  =jx  P  {dE)  =jx(u)  P^idR1)  =jx(a)  dFW(a) .  (3) 

E  Rl  -00 

When  x(u)  is  continuous,  the  last  integral  in  (3)  is  the  ordinary 
Stieltjes  integral.  We  must  note,  however,  that  the  integral 


jx(a)dF^{a) 


can  exist  even  when  the  mathematical  expectation  E(x)  does  not. 
For  the  existence  of  E(x),  it  is  necessary  and  sufficient  that  the 
integral 

f\x(a)\dF(u){a) 
—00 
be  finite4. 

If  u  is  a  point  (ulf  u2,  .  .  .  ,  un)  of  the  space  R^then  as  a  result 
of  (2): 


4  Cf.  V.  Glivenko,  Sur  les  valeurs  probables  de  fonctions,  Rend.  Accad. 
Lincei  v.  8,  1928,  pp.  480-483. 
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E{x)  =  ft. .  .  fx(ult  u2,...,  un)  P<«i.«*.  -.  «■>  («*#») .  (4) 

We  have  already  seen  that  the  conditional  probability  PB(A) 
possesses  all  the  properties  of  a  probability  function.  The  corres- 
ponding integral 

Eb(x)  =   jx?B(dE)  (5) 

E 

we  call  the  conditional  mathematical  expectation  of  the  random 
variable  x  with  respect  to  the  event  B.  Since 


pB(B)  =  0,        JxPB(dE)  =0 


we  obtain  from  (5)  the  equation 

EB(x)  =fxPB(dE)  =  jxPB(dE)  +  jxPB(dE)  =JxPB(dE) 

E  B  B  B 

We  recall  that  in  case  AaB, 

P    (A\  -  P{AB)         P{A"> 


we  thus  obtain 


B 

From  (6)  and  the  equality 


(B)  P(B) 

^B(x)  =  ~]jxP(dE),  (6) 

B 

jxP(dE)  =  P(B)EB{x).  (7) 


A  +  B 

we  obtain  at  last 


JxP(dE)  =  JxP(dE)  +jxP{dE) 


P(A)EA(*)  +  P{B)EB(x) 


E^W-— p-;1-—  (8) 

and,  in  particular,  we  have  the  formula 

EW  =  P(A)EA{*)  +  P(A)Ei(x).  (9) 
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§  3.   The  Tchebycheff  Inequality 

Let  f(x)  be  a  non-negative  function  of  a  real  argument  x, 
which  f or  x  ^  a  never  becomes  smaller  than  b  >  0.  Then  for  any- 
random  variable  x 

p[*^)s»,  (i) 

provided  the  mathematical  expectation   E  {/(*)}       exists.  For, 
E{f(x)}=jf(x)  P(dE)  ^jf(x)P(dE)  ^bP(x^a) , 

from  which  (1)  follows  at  once. 

For  example,  for  every  positive  c  , 

P(x^a)^E-^.  (2) 

Now  let  f(x)  be  non-negative,  even,  and,  for  positive  x,  non- 
decreasing.  Then  for  every  random  variable  x  and  for  any  choice 
of  the  constant  a  >  0  the  following  inequality  holds 

P(|*|fea)3SIipp.  (3). 

In  particular, 

P(|*  -  E(*)|  ^  a)  £E/{VfW> •  (4) 

1  f(a) 

Especially  important  is  the  case  f(x)  =  x2.  We  then  obtain  from 
(3)  and  (4) 

P(\x\&*)^^p.  (5) 

P(|,-EW|^.)^ife^.^,  (6) 

where 

oHx)  =  E{x-E(x)}* 

is  called  the  variance  of  the  variable  x.  It  is  easy  to  calculate  that 
o*(x)  =  E(x*)-{E(x)y. 

If  f(x)  is  bounded: 

\f(x)  \^K, 

then  a  lower  bound  for  P(\x\  ^  a)  can  be  found.  For 
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E (/(*))  ==//(*)  P{dE)  =jf(x)  P(dE)  +  //(*)  P(dE) 

^  f{a)P(\x\  <  a)  +  KP()x\  >  a)  £  /(«)  +  KP(|*|  >  a) 
and  therefore 

P(l^l^a)^E{/(^-/(^.  (7) 

If  instead  of  f(x)  the  random  variable  x  itself  is  bounded, 

1*1  ^M  , 
then  /(#)  g  f(M),  and  instead  of  (7),  we  have  the  formula 

P(|*|a«UE(/y(a).  (8) 

In  the  case  /(#)  =  a;2,  we  have  from  (8) 

§  4.   Some  Criteria  for  Convergence 

Let 

Xi,  %2y  •  •  •  y  Xni  •  •  •  \  *  / 

be  a  sequence  of  random  variables  and  f(x)  be  a  non-negative, 
even,  and  for  positive  x  a  monotonically  increasing  function5. 
Then  the  following  theorems  are  true : 

I.  In  order  that  the  sequence  ( 1 )  converge  in  probability  the 
following  condition  is  sufficient :  For  each  e  >  0  there  exists  an  n 
such  that  for  every  p  >  0,  the  following  inequality  holds : 

E  {f(xn+p  -  *„)}  <  e    .  (2) 

II.  In  order  that  the  sequence  (1)  converge  in  probability  to 
the  random  variable  x,  the  following  condition  is  sufficient : 

HmE{/(*n-%)}  =  0.  (3) 

n-*  +oo 

III.  If  f(x)  is  bounded  and  continuous  and  /(0)  =0,  then 
conditions  I  and  II  are  also  necessary. 

IV.  If  f(x)  is  continuous,  /(0)  =  0,and  the  totality  of  all 
xu  x2, .  . . ,  xm  .  .  . ,  x  is  bounded,then  conditions  I  and  II  are  also 
necessary. 


5  Therefore  f(x)  >  0  if  x  =f=  0. 
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From  II  and  IV,  we  obtain  in  particular 
V.  In  order  that  sequence  (1)  converge  in  probability  to  x, 
it  is  sufficient  that 

limE(a;n-a;)2  =  0  .  (4) 

If  also  the  totality  of  all  xlt  x2, . . . ,  xn, . . . ,  x  is  bounded,  then  the 
condition  is  also  necessary. 

For  proofs  of  I  -  IV  see  Slutsky  [1]  and  Frechet  [1].  How- 
ever, these  theorems  follow  almost  immediately  from  formulas 
(3)  and  (8)  of  the  preceding  section. 

§  5.   Differentiation  and  Integration  of  Mathematical  Expectations 
with  Respect  to  a  Parameter 

Let  us  put  each  elementary  event  $  into  correspondence  with  a 
definite  real  function  x(t)  of  a  real  variable  t.  We  say  that  x(t) 
is  a  random  function  if  for  every  fixed  t,  the  variable  x(t)  is  a 
random  variable.  The  question  now  arises,  under  what  conditions 
can  the  mathematical  expectation  sign  be  interchanged  with  the 
integration  and  differentiation  signs.  The  two  following  theorems, 
though  they  do  not  exhaust  the  problem,  can  nevertheless  give  a 
satisfactory  answer  to  this  question  in  many  simple  cases. 

Theorem  I:  //  the  mathematical  expectation  E[x(t)~\  is  finite 
for  any  t,  and  x(t)  is  always  differ -entiable  for  any  t,  while  the 
derivative  x' (t)  of  x(t)  with  respect  to  t  is  always  less  in  abso- 
lute value  than  some  constant  M,  then 

^E(x(t))  =  E(x'(t)). 

Theorem  II:  //  x(t)  always  remains  less,  in  absolute  value, 

than  some  constant  K  and  is  integrable  in  the  Riemann  sense,  then 

b  r    b 

JE(x(t))dt=  E   jx(t)dt 

a  la 

provided  E[x(t)]  is  integrable  in  the  Riemann  sense. 

Proof  of  Theorem  I.  Let  us  first  note  that  x'  (t)  as  the  limit  of 
the  random  variables 

x(t  +  h)-x(t)  1  1 

h  n-\,    -,...,-,  ... 

is  also  a  random  variable.  Since  x' (t)   is  bounded,  the  mathe- 
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matical  expectation  E[x'(t)]  exists  (Property  VII  of  mathe- 
matical expectation,  in  §  2) .  Let  us  choose  a  fixed  t  and  denote 
by  A  the  event 


xjt  +  h)  -  xjt) 
h 


x'(t) 


>  £ 


The  probability  P  ( A)  tends  to  zero  as  h  — ►  0  for  every  e  >  0.  Since 


x{t  +  h)  -  %{t) 


M, 


x(t)\^M 


holds  everywhere,  and  moreover  in  the  case  A 

\xjt  +  h)-xjt) 

then 


h 


-At) 


Ex(t +  h)^- Ex(t)  _Ex,{t) 


xit  +  h)  -  xit) 


-x\t) 


P(A)E2 


xit  +  h)  -xit) 


x'it) 


P{A)EJ 


h 
xit  +  h)  -  xit) 


x\t) 


^  2M?iA)  +  a  . 

We  may  choose  the  e  >  0  arbitrarily,  and  P(A)   is  arbitrarily 
small  for  any  sufficiently  small  h.  Therefore 


dt 


Exit)  =  lim 


.     Exit +  h) -Exit) 


Exit), 


h  +  0 


which  was  to  be  proved. 
Proof  of  Theorem  II.  Let 


k  =  n 


sn  =  {]?x(t  +  kh),     ^-~r- 

b 

Since  Sn  converges  to  J  —     J  x(t)   dt,  we  can  choose  for  any 

a 

e  >  0  an  N  such  that  from  n^N  there  follows  the  inequality 

P(^)  =  P{|S,  -/|>£}<£    . 

If  we  set 

k=n 

S:  =  l^Exit+kh)  =  EiSn), 

k=\ 

then 


|S*-E(/)|  =  |E(SW-/)|^E|SW-/| 
P(^)  EA\Sn  -  J\  +  9(A)  Ei|Sn  -  J\{  ^  2KP{A)  +  e  ^  (2K  +  l)e  . 
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Therefore,  S*  converges  to  E(J) ,  from  which  results  the  equation 

b 

Ex(t)dt  =  limS*n  =  E(/). 


/' 


Theorem  II  can  easily  be  generalized  for  double  and  triple 
and  higher  order  multiple  integrals.  We  shall  give  an  application 
of  this  theorem  to  one  example  in  geometric  probability.  Let  G  be  a 
measurable  region  of  the  plane  whose  shape  depends  on  chance ; 
in  other  words,  let  us  assign  to  every  elementary  event  £  of  a  field 
of  probability  a  definite  measurable  plane  region  G.  We  shall 
denote  by  /  the  area  of  the  region  G,  and  by  ?(x,  y)  the  prob- 
ability that  the  point  (x,  y)  belongs  to  the  region  G.  Then 

E{J)=jj?{x,y)dxdy. 
To  prove  this  it  is  sufficient  to  note  that 

/  =sfif(x,y)dxdyl 

P(x;y)  =  Ef(x,y), 

where  f(x,y)    is  the  characteristic  function  of  the  region  G 
(fix,  y)  —  1  on  G  and  f(x,  y)  =  0  outside  of  G)6. 


A- 


6  Cf.  A.  Kolmogorov    and  M.  Leontovich,  Zur  Berechnung  der  mittleren 
Brownschen  Fldche,  Physik.  Zeitschr.  d.  Sovietunion,  v.  4,  1933. 


Chapter  V 

CONDITIONAL  PROBABILITIES  AND 
MATHEMATICAL  EXPECTATIONS 

§  1.   Conditional  Probabilities 

In  §  6,  Chapter  I,  we  denned  the  conditional  probability,  P^  (B) , 
of  the  event  B  with  respect  to  trial  %.  It  was  there  assumed  that  % 
allows  of  only  a  finite  number  of  different  possible  results.  We 
can,  however,  define  P%  (B)  also  for  the  case  of  an  %  with  an  infinite 
set  of  possible  results,  i.e.  the  case  in  which  the  set  E  is  partitioned 
into  an  infinite  number  of  non-intersecting  subsets.  In  particular, 
we  obtain  such  a  partitioning  if  we  consider  an  arbitrary  function 
u  of  £  and  define  as  elements  of  the  partition  9l„  the  sets  u  =  con- 
stant. The  conditional  probability  P%U{B)  we  also  denote  by  PU(B). 
Any  partitioning  51  of  the  set  E  can  be  denned  as  the  partitioning 
5iM  which  is  "induced"  by  a  function  u  of  £,  if  one  assigns  to  every  $, 
as  u(£),  that  set  of  the  partitioning  51  of  E  which  contains  |. 

Two  functions  u  and  u'  of  £  determine  the  same  partitioning 
5lM  =  9lM'Of  the  set  E  if  and  only  if  there  exists  a  one-to-one  cor- 
respondence u'  =  f(u)  between  their  domains  $U)  and  5(M,)  such 
that  v!  (£)  is  identical  with  fu(£) .  The  reader  can  easily  show  that 
the  random  variables  PM(Z?)  and  PM*( B),  defined  below,  are  in  this 
case  the  same.  They  are  thus  determined,  in  fact,  by  the  partition 
9L  =  ^itself, 

To  define  PU(B)  we  may  use  the  following  equation: 

P{uCa}(B)  =  E{ucA}Pu(B).  (1) 

It  is  easy  to  prove  that  if  the  set  E(u)  of  all  possible  values  of  u  is 
finite,  equation  (1)  holds  true  for  any  choice  of  A  (when  PU(B) 
is  defined  as  in  §  6,  Chap.  I) .  In  the  general  case  (in  which  PU(B) 
is  not  yet  defined)  we  shall  prove  that  there  always  exists  one 
and  only  one  random  variable  PU(B)  (except  for  the  matter  of 
equivalence)  which  is  defined  as  a  function  of  u  and  which  satis- 
fies equation    (1)    for  every  choice  of  A  from  5(m)  sucn  that 

47 
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PM(A)  >  0.  The  function  PU(B)  of  u  thus  determined  to  within 
equivalence,  we  call  the  conditional  probability  of  B  with  respect 
to  u  (or,  for  a  given  u) .  The  value  of  PM(Z?)  when  u  =  awe  shall 
designate  by  Pu(a;  B). 

The  proof  of  the  existence  and  uniqueness  of  PU(B).  If  we 
multiply   (1)   by   P{ucA}    =  P<«>(A),  we  obtain,  on  the  left, 

P{uczA}PucA{B)  =  P(B{ucA})  =  P\Bu-HAj) 
and,  on  the  right, 

P{ucA}E{ucA}Pu(B)  =  JPU(B)  P(dE)  =JPU(B)  P<*>(rf£(«)); 

{ucA}  A 

leading  to  the  formula 

P(B«-1M))=/Pu(B)PW(i£W).  (2) 

A 

and  conversely  (1)  follows  from  (2).  In  the  case  P(uHA)  =  0, 
in  which  case  (1)  is  meaningless,  equation  (2)  becomes  trivially 
true.  Condition  (2)  is  thus  equivalent  to  (1).  In  accordance  with 
Property  IX  of  the  integral  (§  1,  Chap.  IV)  the  random  variable 
x  is  uniquely  defined  (except  for  equivalence)  by  means  of  the 
values  of  the  integral 

fxPd(E) 

A 

for  all  sets  of  g.  Since  PU(B)  is  a  random  variable  determined 
on  the  probability  field  (8f<*>,  P(M>),it  follows  that  formula  (2) 
uniquely  determines  this  variable  PU(B)  except  for  equivalence. 

We  must  still  prove  the  existence  of  PM(J5).  We  shall  apply 
here  the  following  theorem  of  Nikodym1 : 

Let  5  be  a  Borel  field,  P(A)  a  non-negative  completely  additive 
set  function  defined  on  5  (in  the  terminology  of  the  probability 
theory,  a  random  variable  on  (5,  P)),  and  let  Q(A)  be  another 
completely  additive  set  function  defined  on  J$f>  such  that  from 
Q(A)4=0  follows  the  inequality  P(A)  >  0.  Then  there  exists  a 
function  /(£)  (in  the  terminology  of  the  theory  of  probability, 
a  random  variable)  which  is  measurable  with  respect  to  %,  and 
which  satisfies,  for  each  set  A  of  5,  the  equation 


1  0.  Nikodym,  Sur  une  generalisation  des  integrates  de  M.  J.  Ra  don,  Fund. 
Math.  v.  15,  1930  p.  168  (Theorem  III). 
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0(A)  =    //(f)  P(dE). 

A 

In  order  to  apply  this  theorem  to  our  case,  we  need  to  prove 
1°  that 

Q(A)  =  P(Bu-HA)) 

is  a  completely  additive  function  on  Jp>,  2°,  that  from  Q(A)  +0 
follows  the  inequality  P(M>(A)  >  0. 
Firstly,  2°  follows  from 

0  ^  P{B u-HA))  ^  P(u-HA))  =  P<mHA) . 

For  the  proof  of  1°  we  set 

A  =    ZAn- 
then 

u-l(A)=%u-HAn) 

n 

and  B«->(^)=2B«-l(4). 

n 

Since  P  is  completely  additive,  it  follows  that 

P{BurKA$=2P{Bu-HAj)% 

n 

which  was  to  be  proved. 

From  the  equation  (1)  follows  an  important  formula  (if  we 
set  A  =  #<«>)  : 

P(B)  =  E(PU(B)).  (3) 

Now  we  shall  prove  the  following  two  fundamental  properties 
of  conditional  probability. 

Theorem  I.  It  is  almost  sure  that 

0^Pu(B)  gl.  (4) 

Theorem  II.    //  B  is  decomposed  into  at  most  a  countable 
number  of  sets  Bn  : 

B  =   ZBt 


'n        9 
n 


then  the  following  equality  holds  almost  surely:  , 

P«(£)=ZP»(£»)-  (5) 

n 

These  two  properties  of  PU(B)  correspond  to  the  two  char- 
acteristic properties  of  the  probability  function  P(B)  :  that 
0  g  P(B)  ^  1  always,  and  that  P(B)  is  completely  additive.  These 
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allow  us  to  carry  over  many  other  basic  properties  of  the  absolute 
probability  P(B)  to  the  conditional  probability  PU(B).  However, 
we  must  not  forget  that  PU(B)  is,for  a  fixed  set  B,  a  random  vari- 
able determined  uniquely  only  to  within  equivalence. 

Proof  of  Theorem  I.  If  we  assume — contrary  to  the  assertion 
to  be  proved — that  on  a  set  Msa  E(M>  with  P(M>  (M)  >  0,  the  in- 
equality PU(B)  g  1  +e,  e>  0,  holds  true,  then  according  to  for- 
mula (1) 

P{uc:M}{B)  =  E{ucM}Pu(B)  ^  i  +  e, 

which  is  obviously  impossible.  In  the  same  way  we  prove  that 
almost  surely  PU(B)  ^  0. 

Proof  of  Theorem  II.  From  the  convergence  of  the  series 
ZE\Pu(Bn)\  =2E(Pu(fifl))  =2P(£n)  =  P(B) 

n  n  n 

it  follows  from  Property  V  of  mathematical  expectation  (Chap. 

IV,  §  2)  that  the  series        ■ 

2P.(BJ 

n 

almost  surely  converges.  Since  the  series 

ZE{uoA}\Pu(Bn)\=ZE{u<:A}(Pu(Bn))  =  £  P{UCA}(Bn)  =  P{uCA}(B) 
n  n  n 

converges  for  every  choice  of  the  set  A  such  that  P(u>*(A)  >  0, 
then  from  Property  V  of  mathematical  expectation  just  referred 
to  it  follows  that  for  each  A  of  the  above  kind  we  have  the  relation 

E{uc^}(|;P„(£n))  =|E(,ei)(W)  =  P{uca}(B)  =  E{ucA}(Pu(Bn))f 

and  from  this,  equation  (5)  immediately  follows. 

To  close  this  section  we  shall  point  out  two  particular  cases. 
If,  first,  u(i)  =  c  (a  constant),  then  PC(A)  =  P(A)  almost 
surely.  If,  however,  we  set  u(i)  =  £,  then  we  obtain  at  once 
that  P$\A)  is  almost  surely  equal  to  one  on  A  and  is  almost  surely 
equal  to  zero  on  A.  P${A)  is  thus  revealed  to  be  the  characteristic 
function  of  set  A. 

§  2.   Explanation  of  a  Borel  Paradox 

Let  us  choose  for  our  basic  set  E  the  set  of  all  points  on  a 
spherical  surface.  Our  5  wil1  be  the  aggregate  of  all  Borel  sets 
of  the  spherical  surface.  And  finally,  our  P(A)  is  to  be  propor- 
tional to  the  measure  of  set  A.  Let  us  now  choose  two  diametrically 
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opposite  points  for  our  poles,  so  that  each  meridian  circle  will  be 

uniquely  defined  by  the  longitude    v,  0  ^  ip  <  n  .  Since  y>  varies 

from  0  only  to^r,  —  in  other  words,  we  are  considering  complete 

meridian  circles  (and  not  merely  semicircles)  — the  latitude  0 

must  vary  from  —  n  to  -\-n  (and  not  from  —  -  to  +  ^ ) .  Borel  set 

the  following  problem:  Required  to  determine  "the  conditional 

probability  distribution"  of  latitude   0t      —  7i<0<+tz,  for  a 

given  longitude^. 

It  is  easy  to  calculate  that 

e% 

Py>{0x  =g  0  <  G2}  =  if\cosG\  d0  . 

The  probability  distribution  of   0  for  a  given  V  is  not  uniform. 

If  we  assume  that  the  conditional  probability  distribution  of 
0  "with  the  hypothesis  that  $  lies  on  the  given  meridian  circle" 
must  be  uniform,  then  we  have  arrived  at  a  contradiction. 

This  shows  that  the  concept  of  a  conditional  probability  with 
regard  to  an  isolated  given  hypothesis  whose  probability  equals  0 
is  inadmissible.  For  we  can  obtain  a  probability  distribution 
for  0  on  the  meridian  circle  only  if  we  regard  this  circle  as  an 
element  of  the  decomposition  of  the  entire  spherical  surface  into 
meridian  circles  with  the  given  poles. 

§  3.   Conditional  Probabilities  with  Respect  to  a  Random  Variable 

If  a?  is  a  random  variable  and  PX(B)  as  a  function  of  x  is 
measurable  in  the  Borel  sense,  then  PX(B)  can  be  defined  in  an 
elementary  way.  For  we  can  rewrite  formula  (2)  in  §  1,  to  look 
as  follows : 

P(£)  PJ»(ii)  =/P,(B)  Pl*)(dE) .  (1) 

A 

In  this  case  we  obtain  from  (1)  at  once  that 

a 

P{B)Ff(a)=JPu(a;BydFW(a)  .  (2) 

— oo 

In  accordance  with  a  theorem  of  Lebesgue2  it  follows  from  (2) 
that 

P^BJ-PWllmgg+j^gg  ^o      (3) 

which  is  always  true  except  for  a  set  H  of  points  a  for  which 
P<*>  (H)  =  0. 

2  Lebesgue,  I.  c,  1928,  pp.  301-302. 
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Px(a;  B)  was  defined  in  §  1  except  on  a  set  G,  which  is 
such  that  P(*>  (G)  =  0.  If  we  now  regard  formula  (3)  as  the  defi- 
nition of  Px(a;  B)  (setting  Px(a;  B)  =  0  when  the  limit  in  the 
right  hand  side  of  (3)  fails  to  exist),  then  this  new  variable 
satisfies  all  requirements  of  §  1. 

If,  besides,  the  probability  densities  f(x)  (a)  and  fg>  (a)  exist 
and  if  f(xHa)  >  0,  then  formula  (3)  becomes 

PI(a;S,=  P(S);|W.  (4) 

Moreover,  from  formula  (3)  it  follows  that  the  existence  of  a 
limit  in  (3)  and  of  a  probability  density  f(x)  (a)  results  in  the 
existence  of  /</>  (a).  In  that  case 

P(B)  12(a)  &#*(*).  (5) 

If  P(B)  >  0,  then  from  (4)  we  have 

In  case  f(x)  (a)  =  0,  then  according  to  (5)  /<*>  (a)  —  0  and  there- 
fore (6)  also  holds.  If,  besides,  the  distribution  of  x  is  continuous, 
we  have 

+  oo  +oo 

P(B)  =  E(P,(B))  =j'Px(a;B)dFW(a)  =j?x(a;B)fW(a)da.     (7) 

—  oo  — oo 

From  (6)  and  (7)  we  obtain 

/?(«>=  +yd-*)™  (8) 

fPx(a;B)f*{a)da 

— oo 

This  equation  gives  us  the  so-called  Bayes*  Theorem  for  continu- 
ous distributions.  The  assumptions  under  which  this  theorem  is 
proved  are  these:  PX{B)  is  measurable  in  the  Borel  sense  and  at 
the  point  a  is  defined  by  formula  (3) ,  the  distribution  of  x  is  con- 
tinuous, and  at  the  point  a  there  exists  a  probability  density 
f(*Ha). 

§  4.   Conditional  Mathematical  Expectations 

Let  u  be  an  arbitrary  function  of  £,  and  y  a,  random  variable. 
The  random  variable  Em(t/),  representable  as  a  function  of  u  and 
satisfying,  for  any  set  A  of  $(M>  with  P(M>  (A)  >  0,  the  condition 
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E{u*A}(y)  =  E{uc^}Ett(y)  f  (1) 

is  called  (if  it  exists)  the  conditional  mathematical  expectation  of 
the  variable  y  for  known  value  of  u. 

If  we  multiply  (1)  by  P{u){A),  we  obtain 

jy?(dE)=JEu(y)PM(dEW).  (2) 

{ucA}  A 

Conversely  from  (2)  follows  formula  (1).  In  case  P(M) (A)  —  0, 
in  which  case  (1)  is  meaningless,  (2)  becomes  trivial.  In  the 
same  manner  as  in  the  case  of  conditional  probability  (§1)  we 
can  prove  that  E„(y)  is  determined  uniquely — except  for  equiva- 
lence— by  (2). 

The  value  of  Eu(y)  for  w  =  awe  shall  denote  by  Eu(a;  y) .  Let 
us  also  note  that  Eu(y),  as  well  as  Pu(y),  depends  only  upon  the 
partition  9lM  and  may  be  designated  by  E9ltt  (y) . 

The  existence  of  E(y)  is  implied  in  the  definition  of  Eu(y)  (if 
we  set  A  =  #<»>,  then  E{ucA}(y)  =  E(y)). 

We  shall  now  prove  that  the  existence  of  E  (y)  is  also  sufficient 
for  the  existence  of  Eu(y) .  For  this  we  only  need  to  prove  that  by 
the  theorem  of  Nikodym  (§1),  the  set  function 

Q(A)=fyP(dE) 

{ucA} 

is  completely  additive  on  5(m)  and  absolutely  continuous  with 
respect  to  P(m)(A).  The  first  property  is  proved  verbatim  as  in 
the  case  of  conditional  probability  (§1).  The  second  property — 
absolute  continuity — is  contained  in  the  fact  that  from  Q(A)^0 
the  inequality  PU)(A)  >0  must  follow.  If  we  assume  that 
P(M>(A)  =  P  {udA}  =  0,it  is  clear  that 

Q(A)=fyP(dE)  =  0f 

{ucA} 

and  our  second  requirement  is  thus  fulfilled. 

If  in  equation  (1)  we  set  A  —  E(u\  we  obtain  the  formula 

E(y)  =  E  EU(V)  •  (3) 

We  can  show  further  that  almost  surely 

Eu(ay  +  bz)  =  aEu(y)  +  bEu(z)    ,  (4) 

where  a  and  b  are  two  arbitrary  constants.  (The  proof  is  left  to 
the  reader.) 
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If  u  and  v  are  two  functions  of  the  elementary  event  £,  then 
the  couple  (u,  v)  can  always  be  regarded  as  a  function  of  g.  The 
following  important  equation  then  holds : 

EuE{UtV)(y)  =  Eu(y).  (5) 

For,Eu(y)  is  denned  by  the  relation 

E{Mc^}(y)  =  E{ttd}EM(y)  , 
Therefore  we  must  show  that  EME(M,V)  (y)  satisfies  the  equation 
E{«cA}(y)  =  E{Mc^}EME(tt>r)(y)  .  (6) 

From  the  definition  of  E(u>v)  (y)  it  follows  that 

E{„cA}(y)  =  E{Mc^}E(M>t;)(y) .  (7) 

From  the  definition  of  EME(MjV)  (y)  it  follows,  moreover,  that 

E{u*a}  E(W)t,)  (y)  -  E{MC^}  Em  E(M>r) (y) .  (8) 

Equation  (6)  results  from  equations  (7)  and  (8)  and  thus  proves 
our  statement. 

If  we  set  y  —  PU(B)  equal  to  one  on  B  and  to  zero  outside  of  B, 
then  Eu(y)  =  Pu{B), 

E{UtU)(y)  =  P(UtV)(B). 
In  this  case, from  formula  (5)  we  obtain  the  formula 

EMP(M,„)(B)  = -P  u  (B) .  (9) 

The  conditional  mathematical  expectation  Eu(y)  may  also  be 
defined  directly  by  means  of  the  corresponding  conditional  prob- 
abilities. To  do  this  we  consider  the  following  sums : 

Sx{u)  =~yi°kXPu{kX^y<  (k  +  \)X}  =  TRk.         (10) 

If  E(y)  exists,  the  series  (10)  almost  certainly*  converges.  For 
we  have  from  formula  (3),  of  §  1 , 

E\Rk\  =  \kk\P{kl&y<(k  +  i)X}, 

and  the  convergence  of  the  series 

^ZMP{U^y<(k  +  i)X}=^E\Rk\ 


We  use  almost  certainly  interchangeably  with  almost  surely. 
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is  the  necessary  condition  for  the  existence  of  E(y)  (see  Chap.  IV, 
§  1).  From  this  convergence  it  follows  that  the  series  (10)  con- 
verges almost  certainly  (see  Chap.  IV,  §2,  V).  We  can  further 
show,  exactly  as  in  the  theory  of  the  Lebesgue  integral,  that  from 
the  convergence  of  (10)  for  some  A,  its  convergence  for  every  A 
follows,  and  that  in  the  case  where  series  (10)  converges,  Sx  M 
tends  to  a  definite  limit  as  A  — ►  03.  We  can  then  define 

Eu(y)  =limS;».  (U) 

To  prove  that  the  conditional  expectation  Eu(v)  defined  by  rela- 
tion (11)  satisfies  the  requirements  set  forth  above,  we  need  only 
convince  ourselves  that  EM(y),  as  determined  by  (11),  satisfies 
equation  (1).  We  prove  this  fact  thus: 

E{ueA}Eu(y)  =  hmE{Mc^}S;.(w) 
=  lim     2 kX  p{u<=A}{k*  ^y<(k+l)X}=  E{ucA}(y) . 

'/.  ->  0  k  —  —  oo 

The  interchange  of  the  mathematical  expectation  sign  with  the 
limit  sign  is  admissible  in  this  computation,  since  Sx  (u)  con- 
verges uniformly  to  EM  (y)  as  A  — ►  0  (a  simple  result  of  Property  V 
of  mathematical  expectation  in  §2).  The  interchange  of  the 
mathematical  expectation  sign  and  the  summation  sign  is  also 
admissible  since  the  series 

=^{u,A}{\kX\  ?u[kl  ^y  <  (k  +  1)  A]} 

k=  —  oo 

=  ZW  ?{uCA}[kl  ^y<(k  +  \)X\ 


converges  (an  immediate  result  of  Property  V  of  mathematical 
expectation) . 

Instead  of  (11)  we  may  write 

E.(y)=/y  P.  (<*£).  (12) 

E 

We  must  not  forget  here,  however,  that  (12)  is  not  an  integral 


3  In  this  case  we  consider  only  a  countable  sequence  of  values  of  A;  then 
all  probabilities  Pu{kl<Zy  <  (k  +  i)X\  are  almost  certainly  defined  for  all 
these  values  of  A. 
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in  the  sense  of  §  1,  Chap.  IV,  so  that  (12)  is  only  a  symbolic 
expression. 

If  x  is  a  random  variable  then  we  call  the  function  of  x  and  a 

Ff(a)  =  Ps(y<a) 

the  conditional  distribution  function  of  y  for  known  x. 

Fx{y)  (a)  is  almost  certainly  defined  for  every  a.  If  a  <  b  then 
almost  certainly 

Ff(a)^Ff(b). 

From  (11)  and  (10)  it  follows4  that  almost  certainly 

Ex(y)  =  lim  k=%£kX[Ff{{k  +  \)l)  -  Ff(kl)]  .  (13) 

;.  -+  o  k  =  -  oo 

This  fact  can  be  expressed  symbolically  by  the  formula 

+  00 

Ex(y)  =  fadFf(a)  (14) 

—  oo 

By  means  of  the  new  definition  of  mathematical  expectation  [(10) 
and  (11)]  it  is  easy  to  prove  that,  for  a  real  function  of  u, 

E«[/My]=/(«)EM(y)  .  (15) 


Cf.  footnote  3. 
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INDEPENDENCE;   THE  LAW  OF  LARGE  NUMBERS 

§  1.   Independence 

Definition  1 :  Two  functions,  u  and  v  of  |,  are  mutually  inde- 
pendent if  for  any  two  sets,  A  of  $(w),  and  B  of  %(v),  the  follow- 
ing equation  holds: 

P(ucA,vczB)  =  P{uczA)P{vc:B)  =  PW(A)  P«(B) .         (1) 

If  the  sets  E(u)  and  E{v)  consist  of  only  a  finite  number  of  elements, 

£(«)  =  %  +  u2  +  •  •  •  +  un  , 

#*>  =  »!  + .  w,  +  •  •  •  +  vm  , 

then  our  definition  of  independence  of  u  and  v  is  identical  with 
the  definition  of  independence  of  the  partitions 

k 

E  =^{v  =  vk} 

k 

as  in  §  5,  Chap.  I. 

For  the  independence  of  u  and  v,  the  following  condition  is 
necessary  and  sufficient.  For  any  choice  of  set  A  in  $(w)  the 
following  equation  holds  almost  certainly: 

Pv(uczA)  =  P{uczA)t  (2) 

In  the  case  P(v>(£)  =  0,both  equations  (1)  and  (2)  are  satisfied, 
and  therefore  we  need  only  prove  their  equivalence  in  the  case 
P(v)(B)  >  0.  In  this  case  (1)  is  equivalent  to  the  relation 

P{vcb}(uczA)  =  P{uc:A)  (3) 

and  therefore  to  the  relation 

E{vcB}Pv{uciA)  =  P(«c2)  .  (4) 

On  the  other  hand,  it  is  obvious  that  equation  (4)  follows  from 
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(2).  Conversely  since  Pv(uczA)  is  uniquely  determined  by  (4) 
to  within  probability  zero,  then  equation  (2)  follows  from  (4) 
almost  certainly. 

Definition  2 :  Let  M  be  a  set  of  functions  u^  (I)  of  t  These 
functions  are  called  mutually  independent  in  their  totality  if  the 
following  condition  is  satisfied.  Let  W  and  M"  be  two  non- 
intersecting  subsets  of  M,  and  let  A'  (or  A")  be  a  set  from  g 
defined  by  a  relation  among  u  from  M'  (or  M")  ;  then  we  have 

P(A'A")  =  P(A')P\A"). 

The  aggregate  of  alP«/t  of  W  (or  of  M")  can  be  regarded  as 
coordinates  of  some  function  v!  (or  u").  Definition  2  requires 
only  the  independence  of  u'  and  u"  in  the  sense  of  Definition  1  for 
each  choice  of  non-intersecting  sets  W  and  M" . 

If  ult  Mz,  .  .  .  ,  wn  are  mutually  independent,  then  in  all  cases 

P{ulaAl,  u2cA2,  ...,  unczAn} 


(K) 

=  P(«!  c  4J  P(«t  c^2).,P(mbc^ 

provided  the  sets  AA:  belong  to  the  corresponding  %{Uk)  (proved 
by  induction).  This  equation  is  not  in  general,  however,  at  all 
sufficient  for  the  mutual  independence  of  ult  u2,  .  .  .  ,  un. 

Equation  (5)  is  easily  generalized  for  the  case  of  a  countably 
infinite  product. 

From  the  mutual  independence  of  u^  in  each  finite  group 
(wmi»  u/*,>  •->  ut*k)  ft  does  n°t  necessarily  follow  that  all  ufl  are 
mutually  independent. 

Finally,  it  is  easy  to  note  that  the  mutual  independence  of  the 
functions  u^  is  in  reality  a  property  of  the  corresponding  parti- 
tions tyUfl.  Further,  if  u^  are  single-valued  functions  of  the  cor- 
responding ufi ,  then  from  the  mutual  independence  of  u^  follows 
that  of  u'. 


§  2.   Independent  Random  Variables 

If  xu  x2,  .  .  .  ,  xn  are  mutually  independent  random  variables 
then  from  equation  (2)  of  the  foregoing  paragraph  follows,  in 
particular,  the  formula 

F^  * *»>  (av  a2 ,  . . . ,  an)  =  F<**>  (ax)  F™  (a2) . . .  F^)  (an) .        ( 1 ) 

//  in  this  case  the  field  g(x»  **■••>  **)    consists  only  of  Borel  sets  of 
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the  space  Rn,  then  condition  (1)  is  also  sufficient  for  the  mutual 
independence  of  the  variables  xlf  x2, . . . ,  xn. 

Proof.  Let  %'  =  (x^,  xit, . ..  . ,  xik)  and  x"=  (xh,  xh, . . .,  xjm)  be 
two  non-intersecting  subsystems  of  the  variables  xlt  x2,  .  .  .  ,  x„. 
We  must  show,  on  the  basis  of  formula  ( 1 ) ,  that  for  every  two 
Borel  sets  A'  and  A"  of  Rk  (or  Rm)  the  following  equation  holds : 

P  (*'  G  A',  x"  c  A")  =  P  (*'  c  A')  P  (*"  c  .4")  .  (2) 

This  follows  at  once  from  (1)  for  the  sets  of  the  form 

A'  =  {x(l  <  alf  xit  <  a2,  . . .,  xik  <  ak} , 

A"=  K  <  blt  xh  <b2,  . . . ,  Af;m  <  bm}  . 

It  can  be  shown  that  this  property  of  the  sets  A'  and  A"  is  pre- 
served under  formation  of  sums  and  differences,  from  which 
equation  (2)  follows  for  all  Borel  sets. 

Now  let  x  —  {x^}  be  an  arbitrary  (in  general  infinite)  aggre- 
gate of  random  variables.  //  the  field  $(;r)  coincides  with  the  field 
B$M  (M  is  the  set  of  all  n) ,  the  aggregate  of  equations 

JVi,,..../*(*i»*i.  .-•»««)  =F/Al{a1)Ffli(a2)...F^n(an)         (3) 

is  necessary  and  sufficient  for  the  mutual  independence  of  the 
variables  xu . 

The  necessity  of  this  condition  follows  at  once  from  formula 
( 1 ) .  We  shall  now  prove  that  it  is  also  sufficient.  Let  M'  and  M" 
be  two  non-intersecting  subsets  of  the  set  M  of  all  indices  ^  and 
let  A'  (or  A")  be  a  set  of  B%M  defined  by  a  relation  among  the 'x^ 
with  indices  /x  from  M'  (or  M") .  We  must  show  that  we  then  have 

P(A'A")  =  P(^,)P(^,/)  -  (4) 

If  A'  and  A"  are  cylinder  sets  then  we  are  dealing  with  rela- 
tions among  a  finite  set  of  variables  *u,  equation  (4)  represents 
in  that  case  a  simple  consequence  of  previous  results  (Formula 
(2)).  And  since  relation  (4)  holds  for  sums  and  differences  of 
sets  A'  (or  A")  also,  we  have  proved  (4)  for  all  sets  of  B%M 
as  well. 

Now  for  every  n  of  a  set  M  let  there  be  given  a  priori  a  distri- 
bution function  F^  (a)  ;  in  that  case  we  can  construct  a  field  of 
probability  such  that  certain  random  variables  x^  in  that  field 
(p  assuming  all  values  in  M)  will  be  mutually  independent,  where 
XpWill  have  for  its  distribution  function  the  F^  (a)  given  a  priori. 
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In  order  to  show  this  it  is  enough  to  take  RM  for  the  basic  set  E 
and  B%M  for  the  field  g,  and  to  define  the  distribution  functions 
F/hp*.../**     (see  Chap.  Ill,  §  4)  by  equation  (3). 

Let  us  also  note  that  from  the  mutual  independence  of  each 
finite  group  of  variables  x^  (equation  (3))  there  follows,  as  we 
have  seen  above,  the  mutual  independence  of  all  x^  on  B%M.  In 
more  inclusive  fields  of  probability  this  property  may  be  lost. 

To  conclude  this  section,  we  shall  give  a  few  more  criteria  for 
the  independence  of  two  random  variables. 

If  two  random  variables  x  and  y  are  mutually  independent 
and  if  E(x)  and  E(y)  are  finite  then  almost  certainly 

E,(y)  =  E(y) 

(5) 


Ey(x)  =  E(x). 

These  formulas  represent  an  immediate  consequence  of  the 
second  definition  of  conditional  mathematical  expectation  (For- 
mulas (10)  and  (11)  of  Chap.  V,  §  4).  Therefore,  in  the  case  of 
independence  both 

E[y-E,(y)J»  and  2  =  E[*-E,(*)]» 

1  o2(y)  S  o2(#) 

are  equal  to  zero  (provided  v2(x)  >  0  and  v2(y)  >  0).  The  num- 
ber f2  is  called  the  correlation  ratio  of  y  with  respect  to  x,  and  g2 
the  same  for  x  with  respect  to  y  (Pearson) . 
From  (5)  it  further  follows  that 

E(xy)  =  E(x)  E(y)   .  (6) 

To  prove  this  we  apply  Formula  (15)  of  §  4,  Chap.  V: 

E(xy)  =  EEx{xy)  =  E[xEx(y)]  =  E[xE(y)]  =  E(y)  E(x) . 

Therefore,  in  the  case  of  independence 

r  =  E(*,y)-E(x)E(y) 
o  (x)  a  (y) 

is  also  equal  to  zero;  r,  as  is  well  known,  is  the  correlation  co- 
efficient of  x  and  y. 

If  two  random  variables  x  and  y  satisfy  equation  (6),  then 
they  are  called  uncorrected.  For  the  sum 

S  —  x*  +  x2  +  . . .  -f  xn 
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where  the  xlf  x2,  .  .  .  ,  xn  are  uncorrelated  in  pairs,  we  can  easily 
compute  that 

o2(s)  =  o2(*,)  +  o*(x2)  +  •  •  •  +  o2(*»)  .  (?) 

In  particular,  equation  (7)  holds  for  the  independent  variables  xk. 

§  3.   The  Law  of  Large  Numbers 

Random  variables  s  of  a  sequence 

§lj  &2,  •  •   •   ,  On,  .  .  . 

are  called  stable,  if  there  exists  a  numerical  sequence 

(Zi,  ct2, . . . ,  ctn> .  •  • 
such  that  for  any  positive  e 

P{\sn-dn\^e} 

converges  to  zero  as  n  — *►  oo .  If  all  E(sn)  exist  and  if  we  may  set 

dn  =  E(s„), 
then  the  stability  is  normal. 

If  all  sn  are  uniformly  bounded,  then  from 

P{\sn-dn\^e}-+0  »  +  +oo  (1) 

we  obtain  the  relation 

|E(s„)  -  dn\  ->  0  «->+oo 

and  therefore 

P{|sn-E(sri)|^£}->0.  «->+oo  (2) 

The  stability  of  a  bounded  stable  sequence  is  thus  necessarily 
normal. 

Let  E(sn~E(sn))^  =  aHsn)  =  ^. 

According  to  the  TchebychefF  inequality, 

P{|sn-E(S„)|^£}^^. 

Therefore,  the  Markov  Condition 

<4->0  n^+oo  (3) 

is  sufficient  for  normal  stability. 
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If  sn-E(sn)  are  uniformly  bounded: 
\sn-E(sn)  \^M, 
then  from  the  inequality  (9)  in  §  3,  Chap.  IV, 

P{|s„-E(Sn)|^}fe^-\ 

Therefore,  in  this  case  the  Markov  condition  (3)  is  also  necessary 
for  the  stability  of  the  sn. 
If 

_  xx  +  x2  H j-  xn 

Sn~  n 

and  the  variables  xn  are  uncorrelated  in  pairs,  we  have 
<*  =  i*{<y2(xi)  +  *2(*2)  +  •••  +  **(*»)}• 

Therefore,  in  this  case,  the  following  condition  is  sufficient  for 
the  normal  stability  of  the  arithmetical  means  sn: 

°l  =  o*  (Xl)  +  tf  (x2)  +  •  • .  +  a*  (*J  =  0  (»*)  (4) 

(Theorem  of  Tchebycheff) .  In  particular,  condition  (4)  is  ful- 
filled if  all  variables  x„  are  uniformly  bounded. 

This  theorem  can  be  generalized  for  the  case  of  weakly  cor- 
related variables  xn.  If  we  assume  that  the  coefficient  of  correla- 
tion rmna  of  xm  and  x„  satisfies  the  inequality 

rmn^c(\n-m\) 
and  that 

c.  =  2>(*). 

jfc  =  0 

then  a  sufficient  condition  for  normal  stability  of  the  arithmetic 
means  s  is2 

C„oi-o(HP).  (5) 

In  the  case  of  independent  summands  xn  we  can  state  a  neces- 
sary and  sufficient  condition  for  the  stability  of  the  arithmetic 
means  sn.  For  every  xn  there  exists  a  constant  mn  (the  median  of 
xn)  which  satisfies  the  following  conditions: 

P(*n<**n)  ^i> 


1  It  is  obvious  that  rmn  =  1  always. 

2  Cf.  A.  Khintchine,  Swr  Za  loi  forkdes  grandes  nombres.  C.  R.  de  l'acad. 
sci.  Paris  v.  186,  1928,  p.  285. 
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We  set 


Xnl 

;   =  %k 

if 

I  ^fc-mfc  |  ^  n, 

Xnk  ~~ 

0 

otherwise, 

c*  _ 

Xn 

+  *„*  +  •••  +*„ 

« 

j  relations 

k=n 

ZP{|**- 

*=1 

mk\ 

k=n 

I 

=  n 

2  p  (*•»  +  **)  -* 
=i 

+  oo         (6) 


oM^)=J>2(*n*)  =  <>(*2)  (7) 

are  necessary  and  sufficient  for  the  stability  of  variables  sn3. 

We  may  here  assume  the  constants  dn  to  be  equal  to  the  E(s„*) 
so  that  in  the  case  where 

E(s*)  -E(sn)-»0  w->+cx) 

(and  only  in  this  case)  the  stability  is  normal. 

A  further  generalization  of  Tchebycheff 's  theorem  is  obtained 
if  we  assume  that  the  sn  depend  in  some  way  upon  the  results  of 
any  n  trials, 

«i,  %,  •  •  •  ,  %n    . 

so  that  after  each  definite  outcome  of  all  these  n  trials  sn  assumes 
a  definite  value.  The  general  idea  of  all  these  theorems  known  as 
the  law  of  large  numbers,  consists  in  the  fact  that  if  the  depend- 
ence of  variables  sn  upon  each  separate  trial  %k  (k  =  1,  2, . . . ,  n) 
is  very  small  for  a  large  n,  then  the  variables  sn  are  stable.  If  we 
regard 

$ik  =  E[EH1at...9u(Sn)  —  E«,9l«...«*-i(Sn)]2 

as  a  reasonable  measure  of  the  dependence  of  variables  sn  upon 
the  trial  Efc,  then  the  above-mentioned  general  idea  of  the  law  of 
large  numbers  can  be  made  concrete  by  the  following  considera- 
tions4. 

*n*  =  E«, «,  . . .  21*  (sn)  ~  E^  9t2 . . .  2U-  _  i  (s«)  • 


3  Cf.  A.  KoLMOGOROy  .  tlber  die  Summen  durch  den  Zufall  bestimmter 
unabhangiger  Grossen,  Math.  Ann.  v.  99,  1928,  pp.  309-319  (corrections  and 
notes  to  this  study,  v.  102,  1929  pp.  484-488,  Theorem  VIII  and  a  supplement 
on  p.  318). 

4  Cf.  A.  KolmogoroY-  Sur  la  loi  des  grandes  nombres.  Rend.  Accad.  Lincei 
v.  9,  1929  pp.  470-474. 


64  VI.  Independence;  The  Law  of  Large  Numbers 

Then 

sn  —  E(s„)  =  zx  +  z2  +  •  •  •  4-  zn , 

E{znk)  =  EE3ll9(8...9lifc(sn)  —  EE?il9Il...9iJt_1(sn)  =  E(sn)  -  E(sn)  =  0. 

aM^t)  =  E(4,)  =  ^.. 

We  can  easily  compute  also  that  the  random  variables  znk  (k  — 
1,  2,  .  .  .  ,  n)  are  uncorrelated.  For  let  i  <  k ;  then5 

E^x 9l8  . . .  2U_  i  {zni  znk)  =  *n»  E^i, $(,  ....«*_  i  (2nit) 
=  ^nttE^M.  ...3lt_x(s„)  —  E9tl9it...8ifc_1(sw)]  =  0 

and  therefore 

E(zniznk)  =  0. 
We  thus  have 

02(SH)  =  0*(Zni)   +  0*(Zn2)  +   •  •  •   +  O^n)   =  ft,  +  #2  +   •  •  •    +  fin  - 

Therefore,  the  condition 

is  sufficient  for  the  normal  stability  of  the  variables  sn. 

§  4.   Notes  on  the  Concept  of  Mathematical  Expectation 

We  have  denned  the  mathematical  expectation  of  a  random 
variable  x  as 

E{x)  =  fx?{dE)  =jadF&{a)   , 

E  —<x> 

where  the  integral  on  the  right  is  understood  as 

+  oo  C 

E  (x)  =  fa  dF&  {a)  =  lim  fa  dF&  (a).      b  ""*  ~  °°  ( 1 ) 

-oo  6  ' 

The  idea  suggests  itself  to  consider  the  expression 

E*(x)  =  lim  f  a  dF&  {a)  b  -*  +<x>         (2) 

-b 


Application  of  Formula  (15)   in  §4,  Chap.  V. 
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as  a  generalized  mathematical  expectation.  We  lose  in  this  case, 
of  course,  several  simple  properties  of  mathematical  expectation. 
For  example,  in  this  case  the  formula 

E(s  +  y)  =  E(x)  +  E(y) 

is  not  always  true.  In  this  form  the  generalization  is  hardly 
admissible.  We  may  add  however  that,  with  some  restrictive 
supplementary  conditions, definition  (2)  becomes  entirely  natural 
and  useful. 

We  can  discuss  the  problem  as  follows.  Let 

X\t  X21  •  •  •  j  Xn,  •  •  • 

be  a  sequence  of  mutually  independent  variables,  having  the  same 
distribution  function  F(x^(a)  =F(Xn)(a),  (n  =  1,  2,  .  .  .  )  as  x. 
Let  further 

*1  +  *2  H 1"  *n 


We  now  ask  whether  there  exists  a  constant  E*  (x)  such  that 
for  every  e  >  0 

limP(|sn-E*(*)|    ><0=O,      w^+cx).         (3) 

The  answer  is  :  //  such  a  constant  E*  (x)  exists,  it  is  expressed  by 
Formula  (2) .  The  necessary  and  sufficient  condition  that  Formula 
(3)  hold  consists  in  the  existence  of  limit  (2)  and  the  relation 

P(|*|  >n)-o(±).  (4) 

To  prove  this  we  apply  the  theorem  that  condition  (4)  is 
necessary  and  sufficient  for  the  stability  of  the  arithmetic  means 
s„,  where,  in  the  case  of  stability,  we  may  set6 

+  n 

dn=jadF(x)(a)  . 

—  n 

If  there  exists  a  mathematical  expectation  in  the  former  sense 
(Formula  (1)),  then  condition  (4)  is  always  fulfilled7.  Since  in 
this  case  £(x)  =  E*(x),  the  condition  (3)  actually  does  define  a 
generalization  of  the  concept  of  mathematical  expectation.  For 
the    generalized    mathematical    expectation,    Properties    I  -  VII 


8  Cf .  A.  Kolmogorov  ,  Bemerkungen  zu  meiner  Arbeit,  "Uber  die  Summen 
zufdlliger  Grossen"  Math.  Ann.  v.  102,  1929,  pp.  484-488,  Theorem  XII. 
7  Ibid,  Theorem  XIII. 
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(Chap.  IV,  §2)  still  hold;  in  general,  however,  the  existence  of 
E*|  x  |  does  not  follow  from  the  existence  of  E*(#). 

To  prove  that  the  new  concept  of  mathematical  expectation 
is  really  more  general  than  the  previous  one,  it  is  sufficient  to 
give  the  following  example.  Set  the  probability  density  f(x)  (a) 
equal  to 

Q 

f{X){a)  =  (|«|  +  2)«ln(|ii|  +  2)  ' 

where  the  constant  C  is  determined  by 

+00 

ff&{a)da  =  i  . 


It  is  easy  to  compute  that  in  this  case  condition  (4)  is  fulfilled. 
Formula  (2)  gives  the  value 

E*(x)  =  0, 

but  the  integral 

+00  +00 

j\a\dFW(a)=f\a\fW(a)da 

—00  —00 

diverges. 

§  5.    Strong  Law  of  Large  Numbers;  Convergence  of  Series 

The  random  variables  ,9,,  of  the  sequence 

Sit  *>2>  •  •  •  i  Snt  •  •  • 

are  strongly  stable  if  there  exists  a  sequence  of  numbers 

0^1  >  C^2>  •  •  •  >  ("nt  '  •  • 

such  that  the  random  variables 

Sn  ~  Q/fi 

almost  certainly  tend  to  zero  as  n  ->  +00 .  From  strong  stability 
follows,  obviously,  ordinary  stability.  If  we  can  choose 

dn  =  E(s„)  , 

then  the  strong  stability  is  normal. 
In  the  Tchebycheff  case, 

c  —  *» +  *2  J b Xn 
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where  the  variables  xn  are  mutually  independent.  A  sufficient8 
condition  for  the  normal  strong  stability  of  the  arithmetic  means 
sn  is  the  convergence  of  the  series 


2*P  ■  (i) 


n=l 

This  condition  is  the  best  in  the  sense  that  for  any  series  of  con- 
stants bn  such  that  ^ 

n=l 

we  can  build  a  series  of  mutually  independent  random  variables 
xn  such  that 

and  the  corresponding  arithmetic  means  sn  will  not  be  strongly 
stable. 

If  all  xn  have  the  same  distribution  function  F(jr>  (a) ,  then  the 
existence  of  the  mathematical  expectation 

E(x)=jadFW(a) 

—  oo 

is  necessary  and  sufficient  for  the  strong  stability  of  sn ;  the  sta- 
bility in  this  case  is  always  normal9. 
Again,  let 

*£l>  X>2)  •  •  •  )  Xnt  .  .  . 

be  mutually  independent  random  variables.  Then  the  probability 
of  convergence  of  the  series 

fin  (2) 

n=l 

is  equal  either  to  one  or  to  zero.  In  particular,  this  probability 
equals  one  when  both  series 

jjEfoJ      and      JSy-fo) 

n=l  n=l 

converge.  Let  us  further  assume 

yn  =  xn  in  case  [xn\^l, 
yn  =  0  in  case  |  xn  \  >  1. 


8  Cf.  A.  Kolmogorov/  Sur  la  loi  forte  des  grandes  nombres,  C.  R.  Acad.  Sci. 
Paris  v.  191,  1930,  pp.  910-911. 

9  The  proof  of  this  statement  has  not  yet  been  published. 
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Then  in  order  that  series  ( 1 )  converge  with  the  probability  one, 
it  is  necessary  and  sufficient10  that  the  following  series  converge 
simultaneously : 

CO  CO  CO 

Zp(W>1}.      ZE(%.)    and    2>2(y„)    - 

n=l  n=l  n=l 


10  Cf.  A.  Khintchine  and  A.  Kolmogorov,    On  the  Convergence  of  Series, 
Rec.  Math.  Soc.  Moscow,  v.  32,  1925,  p.  668-677. 


Appendix 

ZERO-OR-ONE  LAW  IN  THE  THEORY 
OF  PROBABILITY 

We  have  noticed  several  cases  in  which  certain  limiting 
probabilities  are  necessarily  equal  to  zero  or  one.  For  example, 
the  probability  of  convergence  of  a  series  of  independent  random 
variables  may  assume  only  these  two  values1.  We  shall  prove  now 
a  general  theorem    including  many  such  cases. 

Theorem  :  Let  xu  xz, . . . ,  xn,  .  .  .  be  any  random  variables  and 
let  f(Xi,  x2,  .  .  .  ,  xn,  .  .  .)  be  a  Baire  function2  of  the  variables 
xXt  x2,  .  .  .  ,  x„,  .  .  .  such  that  the  conditional  probability 

P*.*.....*{/(*)  =  0} 

of  the  relation 

f{x1,x2>...,xn,...)  =0 

remains,  when  the  first  n  variables  xlf  x2, . . . ,  x„  are  known,  equal 
to  the  absolute  probability 

P{/(*)=0}  (1) 

for  every  n.  Under  these  conditions  the  probability  (1)  equals 
zero  or  one. 

In  particular,  the  assumptions  of  this  theorem  are  fulfilled  if 
the  variables  xn  are  mutually  independent  and  if  the  value  of  the 
function  f(x)  remains  unchanged  when  only  a  finite  number  of 
variables  are  changed. 

Proof  of  the  Theorem :  Let  us  denote  by  A  the  event 

f(x)  =0. 

We  shall  also  investigate  the  field  St  of  all  events  which  can  be 
defined  through  some  relations  among  a  finite  number  of  vari- 


1  Cf .  Chap.  VI,  §  5.  The  same  thing  is  true  of  the  probability 

PK-rf„-*o} 

in  the  strong  law  of  large  numbers ;  at  least,  when  the  variables  xn  are  mutu- 
ally independent. 

2  A  Baire  function  is  one  which  can  be  obtained  by  successive  passages  to 
the  limit,  of  sequences  of  functions,  starting  with  polynomials. 
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ables  xn.  If  event  B  belongs  to  ®,  then,  according  to  the  conditions 
of  the  theorem, 

PM(A)  =  P(A).  (2) 

In  the  case  P(A)  =  0  our  theorem  is  already  true.  Let  now 
P(A)  >  0.  Then  from  (2)  follows  the  formula 

Pa(B)  =  P'{££IB)  =  P(B),  (3) 

and  therefore  P(B)  and  PA(B)  are  two  completely  additive  set 
functions,  coinciding  on  ® ;  therefore  they  must  remain  equal  to 
each  other  on  every  set  of  the  Borel  extension  B®  of  the  field  St 
Therefore,  in  particular, 

P(A)  =  PA(A\=i, 

which  proves  our  theorem. 

Several  other  cases  in  which  we  can  state  that  certain  prob- 
abilities can  assume  only  the  values  one  and  zero,  were  discovered 
by  P.  Levy.  See  P.  Lriw,  Sur  un  theoreme  de  M.  Khintchine,  Bull, 
des  Sci.  Math.  v.  55,  1931,  pp.  145-160,  Theorem  II. 
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